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SPECIFIC A TION 


Seed coat specific DNA regulatory region and peroxidase 
The present invention relates to a novel DNA molecule comprising a plant seed 
coat specific DNA regulatory region and a novel structural gene encoding a peroxidase. 
The seed-coat specific DNA regulatory region may also be used to control the 
expression of other genes of interest within the seed coat. 


BACKGROUND OF THE INVENTION 


Full citations for references appear at the end of the Examples section. 


Peroxidases are enzymes catalyzing oxidative reactions that use H 2 0 2 as an 
electron acceptor. These enzymes are widespread and occur ubiquitously in plants as 
isozymes that may be distinguished by their isoelectric points. Plant peroxidases 
contribute to the structural integrity of cell walls by functioning in lignin biosynthesis 
and suberization, and by forming covalent cross-linkages between extension, cellulose, 
pectin and other cell wall constituents (Campa, 1991). Peroxidases are also associated 
with plant defence responses and resistance to pathogens (Bowles, 1990; 
Moerschbacher 1992). Soybeans contain 3 anionic isozymes of peroxidase with a 
minimum M r of 37 kDa (Sessa and Anderson, 1981). Recently one peroxidase 
isozyme, localised within the seed coat of soybean, has been characterized with a M r 
of 37 kDa (Gillikin and Graham, 1991). 
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In an analysis of soybean seeds, Buttery and Buzzell (1968) showed that the 
amount of peroxidase activity present in seed coats may vary substantially among 
different cultivars. The presence of a single dominant gene Ep causes a high seed coat 
peroxidase phenotype (Buzzell and Buttery, 1969). Homozygous recessive epep plants 
are -100-fold lower in seed coat peroxidase activity. This results from a reduction in 
5 the amount of peroxidase enzyme present, primarily in the hourglass cells of the 
subepidermis (Gijzen et aL, 1993). In plants carrying the Ep gene, peroxidase is 
heavily concentrated in the hourglass cells (osteosclereids). These cells form a highly 
differentiated cell layer with thick, elongated secondary walls and large intercellular 
spaces (Baker et ai, 1987). Hourglass cells develop between the epidermal 

10 macrosclereids and the underlying articulated parenchyma, and are a prominent feature 
of seed coat anatomy at full maturity. The cytoplasm exudes from the hourglass cells 
upon imbibition with water and a distinct peroxidase isozyme constitutes five to 10% 
of the total soluble protein in EpEp seed coats. It is not known why the hourglass cells 
accumulate large amounts of peroxidase, but the sheer abundance and relative purity 

15 of the enzyme in soybean seed coats is significant because peroxidases are versatile 
enzymes with many commercial and industrial applications. Studies of soybean seed 
coat peroxidase have shown this enzyme to have useful catalytic properties and a high 
degree of thermal stability even at extremes of pH (McEldoon et ah , 1995). These 
properties result in the preferred use of soybean peroxidase, over that of horseradish 

20 peroxidase, in diagnostic assays as an enzyme label for antigens, antibodies, 
oligonucleotide probes, and within staining techniques. Johnson et al report on the use 
of soybean peroxidase for the deinking of printed waste paper (U.S. 5,270,770; 
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December 6, 1994) and for the biocatalytic oxidation of primary alcohols (U.S. 
5,391,488; February 13, 1996). Soybean peroxidase has also been used as a 
replacement for chlorine in the pulp and paper industry, or as formaldehyde 
replacement (Freiberg, 1995). 

5 An anionic soybean peroxidase from seed coats has been purified (Gillikin and 

Graham, 1991). This protein has a pi of 4.1 and M r of 37 kDa. A method for the 
bulk extraction of peroxidase from seed hulls of soybean using a freeze thaw technique 
has also been reported (U.S. 5,491,085, February 13, 1996, Pokara and Johnson). 

10 Lagrimini et al (1987) disclose the cloning of a ubiquitous anionic peroxidase 

in tobacco encoding a protein of M r of 36 kDa. This peroxidase has also been over 
expressed in transgenic tobacco plants (Lagrimini et al 1990) and Maliyakal discloses 
the expression of this gene in cotton (WO 95/08914). 

1 5 Huangpu et al (1995) reported the partial cloning of a soybean anionic seed coat 

peroxidase. The 1031 bp sequence contained an open reading frame of 849 bp 
encoding a 283 amino acid protein with a Mr of 30,577. The M r of this peroxidase is 
7 kDa less than what one would expect for a soybean seed coat peroxidase as reported 
by Gillikin and Graham (1991) and possibly represents another peroxidase isozyme 

20 within the seed coat. 
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The upstream promoter sequences for two poplar peroxidases have been 
described by Osakabe et al (1995). A number of characteristic regulatory sites were 
identified from comparison of these sequences to existing promoter elements. 
Additionally, a cryptic promoter with apparent specificity for seed coat tissues was 
isolated from tobacco by a promoter trapping strategy (Fobert et al. 1994). The 
5 upstream regulatory sequences associated with the Ep gene in soybean are distinct from 
these and other previously characterized promoters. The soybean Ep promoter drives 
high-level expression in a cell and tissue specific manner. The peroxidase protein 
encoded by the Ep gene accumulates in the seed coat tissues, especially in the hour 
glass cells of the subepidermis. Minimal expression of the gene is detected in root 
10 tissues. 

One problem arising from the desired use of soybean seed coat peroxidase is 
that there is variability between soybean varieties regarding peroxidase production 
(Buttery and Buzzell, 1986; Freiberg, 1995). Due to the commercial interest in the use 
15 of soybean seed coat peroxidase new methods of producing this enzyme are required. 
Therefore, the gene responsible for the expression of the 37 kDa isozyme in soybean 
seed coat was isolated and characterized. 

Furthermore, novel regulatory regions obtained from the genomic DNA of 
20 soybean seed coat peroxidase have been isolated and characterized and are useful in 
directing the expression of genes of interest in seed coat tissues. 


SUMMARY OF THE INVENTION 

The present invention relates to a DNA molecule that encodes a soybean seed 
coat peroxidase and associated DNA regulatory regions. 

This invention also embraces isolated DNA molecules comprising the nucleotide 
sequence of either SEQ ID NO: 1 (the cDNA encoding soybean seed coat peroxidase) 
SEQ ID No: 2 (the genomic sequence). 

This invention also provides for a chimeric DNA molecule comprising a seed 
coat-specific regulatory region having nucleotides 1-1532 of SEQ ID NO: 2 and a gene 
of interest under control of this DNA regulatory region. Also included within this 
invention are chimeric DNA molecules comprising genomic DNA sequences 
exemplified by nucleotides 412-1041, 1234-2263 or 2430-2691 of SEQ ID NO:2. 
Furthermore, this invention is directed to isolated DNA molecules comprising at least 

1) 24 contiguous nucleotides selected from nucleotides 1-1532 of SEQ ID 
NO:2; 

2) 32 contiguous nucleotides selected from nucleotides 412-1041 of SEQ 
ID NO:2; 

3) 23 contiguous nucleotides selected from nucleotides 1234-2263 of SEQ 
ID NO:2; or 

4) 22 contiguous nucleotides selected from nucleotides 2430-2691 of SEQ 
ID NO:2. 
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The present invention also provides for vectors which comprise DNA molecules 
encoding soybean seed coat peroxidase. Such a construct may include the DNA 
regulatory region from SEQ ID NO:2, including nucleotides 1-1532, or at least 24 
contiguous nucleotides selected from nucleotides 1-1532 of SEQ ID NO: 2 in 
conjunction with the seed coat peroxidase gene, or the seed coat peroxidase gene under 
the control of any suitable constitutive or inducible promoter of interest. 

This invention is also directed towards vectors which comprise a gene of 
interest placed under the control of a DNA regulatory element derived from the 
genomic sequence encoding soybean seed coat peroxidase. Such a regulatory element 
includes nucleotides 1-1532 of SEQ ID NO:2, or at least 24 contiguous nucleotides 
selected from nucleotides 1-1532 of SEQ ID NO:2. Elements comprising nucleotides 
412-1041, 1234-2263 or 2430-2691 of SEQ ID NO:2, or 32 contiguous nucleotides 
selected from nucleotides 412-1041 of SEQ ID NO:2, 23 contiguous nucleotides 
selected from nucleotides 1234-2263 of SEQ ID NO:2, or 22 contiguous nucleotides 
selected from nucleotides 2430-2691 of SEQ ID NO:2 may also be used. 

This invention also embraces prokaryotic and eukaryotic cells comprising the 
vectors identified above. Such cells may include bacterial, insect, mammalian, and 
plant cell cultures . 
20 

This invention also provides for transgenic plants comprising the seed coat 
peroxidase gene under control of constitutive or inducible promoters. Furthermore, 
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this invention also relates to transgenic plants comprising the DNA regulatory regions 
of nucleotides 1-1532 of SEQ ID NO:2 controlling a gene of interest, or comprising 
genes of interest in functional association with genomic DNA sequences exemplified 
by nucleotides 412-1041, 1234-2263 or 2430-2691 of SEQ ID NO:2. Also embraced 
by this invention are transgenic plants having regulatory regions comprising at least 24 
contiguous nucleotides selected from nucleotides 1-1532 of SEQ ID NO:2, 32 
contiguous nucleotides selected from nucleotides 412-1041 of SEQ ID NO:2, 23 
contiguous nucleotides selected from nucleotides 1234-2263 of SEQ ID NO:2, or 22 
contiguous nucleotides selected from nucleotides 2430-2691 of SEQ ID NO:2. 

This invention is also directed to a method for the production of soybean seed 

coat peroxidase in a host cell comprising: 

i) transforming the host cell with a vector comprising an oligonucleotide 
sequence that encodes soybean seed coat peroxidase; and 

ii) culturing the host cell under conditions to allow expression of the 
soybean seed coat peroxidase. 

This invention also provides for a process for producing a heterologous gene 
of interest within seed coats of a transformed plant, comprising propagating a plant 
transformed with a vector comprising a gene of interest under the control of 
nucleotides 1-1532 of SEQ ID NO:2. Furthermore, this invention embraces a process 
for producing a heterologous gene of interest within seed coats of a transformed plant, 
comprising propagating a plant transformed with a vector comprising a gene of interest 


under the control of a regulatory region comprising at least 24 nucleotides selected 
from nucleotides 1-1532 of SEQ ID NO:2. 

Although the present invention is exemplified by a soybean seed coat peroxidase 
and adjacent DNA regulatory regions, in practice any gene of interest can be placed 
downstream from the DNA regulatory region for seed coat specific expression. 


BRIEF DESCRIPTION OF THE DRAWINGS 

These and other features of the invention will become more apparent from the 
following description in which reference is made to the appended drawings wherein: 

Figure 1 is the cDNA and deduced amino acid sequence of soybean seed coat 
peroxidase. Nucleotides are numbered by assigning + 1 to the first base of the 
ATG start codon; amino acids are numbered by assigning + 1 to the N-terminal 
Gin residue after cleavage of the putative signal sequence. The N-terminal 
signal sequence, the region of the active site, and the heme-binding domain are 
underlined. The numerals I, II and III placed directly above single nucleotide 
gaps in the sequence indicate the three intron splice positions. The target site 
and direction of five different PCR primers are shown with dotted lines above 
the nucleotide sequence. An asterisk (*) marks the translation stop codon. 

Figure 2 is the genomic DNA sequence of the Soybean seed coat peroxidase. 

Figure 3 is a comparison of soybean seed coat peroxidase with other closely related 
plant peroxidases. The GenBank accession numbers are provided next to the 
name of the plant from which the peroxidase was isolated. The accession 
number for the soybean sequence is L78163. (A) A comparison of the nucleic 
acid sequences; (B) A comparison of the amino acid sequences. 
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Figure 4 is a restriction fragment length polymorphisms between EpEp and epep 
genotypes using the seed coat peroxidase cDNA as probe. Genomic DNA of 
soybean lines OX312 {epep) and OX347 (EpEp) was digested with restriction 
enzyme, separated by electrophoresis in a 0.5% agarose gel, transferred to 
nylon, and hybridized with 32 P-labelled cDNA encoding the seed coat 
peroxidase. The size of the hybridizing fragments was estimated by comparison 
to standards and is indicated on the right. 

Figure 5 exhibits the structure of the Ep Locus. A 17 kb fragment including the Ep 
locus is illustrated schematically. A 3.3 kb portion of the gene is enlarged and 
exons and introns are represented by shaded and open boxes, respectively. The 
final enlargement of the 5 T region shows the location and DNA sequence 
around the 87 bp deletion occurring in the ep allele of soybean line OX312. 
Nucleotides are numbered by assigning + 1 to the first base of the ATG start 
codon. 

Figure 6 displays PCR analysis of EpEp and epep genotypes using primers derived 
from the seed coat peroxidase cDNA. Genomic DNA from soybean lines 
OX312 (epep) and OX347 (EpEp) was used as template for PCR analysis with 
four different primer sets. Amplification products were separated by 
electrophoresis through a 0.8% agarose gel and visualized under UV light after 
staining with ethidium bromide. Genotype and primer combinations are 
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indicated at the top of the figure. The size in base pairs of the amplified DNA 
fragments are indicated on the right. 

Figure 7 exhibits PCR analysis of an F2 population from a cross of EpEp and epep 
genotypes. Genomic DNA was used as template for PCR analysis of the 
parents (P) and 30 F 2 individuals. The cross was derived from the soybean lines 
OX312 {epep) and OX347 (EpEp). Plants were self pollinated and seeds were 
collected and scored for seed coat peroxidase activity. The symbols (-) and (+) 
indicate low and high seed coat peroxidase activity, respectively. Primers 
prx9H- and prxlO- were used in the amplification reactions. Products were 
separated by electrophoresis through a 0.8% agarose gel and visualized under 
UV light after staining with ethidium bromide. The migration of molecular 
markers and their corresponding size in kb is also shown (lanes M). 

Figure 8 displays PCR analysis of six different soybean cultivars with primers derived 
from the seed coat peroxidase cDNA sequence. Genomic DNA was used as 
template for PCR analysis of three EpEp cultivars and three epep cultivars. 
Primers used in the amplification reactions and the size of the DNA product is 
indicated on the left. Products were separated by electrophoresis through a 
0.8% agarose gel and visualized under UV light after staining with ethidium 
bromide. 

(A) Forward and reverse primers are downstream from deletion 

(B) Forward primer anneals to site within deletion 
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(C) Primers span deletion 
Figure 9 shows the accumulation of peroxidase RNA in tissues of GEp and epep 

plants. Figure 9(A): A comparison of peroxidase transcript abundance 
in cultivars Harosoy 63 (Ep) or Marathon (ep). Seed and pod tissues 
were sampled at a late stage of development corresponding to a whole 
5 seed fresh weight of 250 mg. Root and leaf tissue was from six week 

old plants. Autoradiograph exposed for 96 h. Figure 9(B): 
Developmental expression of peroxidase in cultivar Harosoy 63 (Ep). 
Flowers were sampled immediately after opening. Seed coat tissues 
were sampled at four stages of development corresponding to a whole 
10 seed fresh weight of: lane 1, 50 mg; lane 2, 100 mg; lane 3, 200 mg; 

lane 4, 250 mg. Autoradiograph exposed for 20 h. 
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DESCRIPTION OF PREFERRED EMBODIMENT 

The present invention is directed to a novel oligonucleotide sequence encoding 
a seed coat peroxidase and associated DNA regulatory regions. 

According to the present invention DNA sequences that are "substantially 
homologous" includes sequences that are identified under conditions of high 
stringency. "High stringency" refers to Southern hybridization conditions employing 
washes at 65 °C with 0.1 x SSC, 0.5 % SDS. 

By "DNA regulatory region" it is meant any region within a genomic sequence 
that has the property of controlling the expression of a DNA sequence that is operably 
linked with the regulatory region. Such regulatory regions may include promoter or 
enhancer regions, and other regulatory elements recognized by one of skill in the art. 
A segment of the DNA regulatory region is exemplified in this invention, however, as 
is understood by one of skill in the art, this region may be used as a probe to identify 
surrounding regions involved in the regulation of adjacent DNA, and such surrounding 
regions are also included within the scope of this invention. 

In the context of this disclosure, the term "promoter" or "promoter region" 
refers to a sequence of DNA, usually upstream (5') to the coding sequence of a 
structural gene, which controls the expression of the coding region by providing the 
recognition for RNA polymerase and/or other factors required for transcription to start 
at the correct site. 
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There are generally two types of promoters, inducible and constitutive. An 
"inducible promoter" is a promoter that is capable of directly or indirectly activating 
transcription of one or more DNA sequences or genes in response to an inducer. In 
the absence of an inducer the DNA sequences or genes will not be transcribed. 
Typically the protein factor, that binds specifically to an inducible promoter to activate 
transcription, is present in an inactive form which is then directly or indirectly 
converted to the active form by the inducer. The inducer can be a chemical agent such 
as a protein, metabolite, growth regulator, herbicide or phenolic compound or a 
physiological stress imposed directly by heat, cold, salt, or toxic elements or indirectly 
through the action of a pathogen or disease agent such as a virus. A plant cell 
containing an inducible promoter may be exposed to an inducer by externally applying 
the inducer to the cell or plant such as by spraying, watering, heating or similar 
methods. 

By "constitutive promoter" it is meant a promoter that directs the expression 
of a gene throughout the various parts of a plant and continuously throughout plant 
development. Examples of known constitutive promoters include those associated with 
the CaMV 35S transcript and Agrobacterium Ti plasmid nopaline synthase gene. 

The chimeric gene constructs of the present invention can further comprise a 
3' untranslated region. A 3' untranslated region refers to that portion of a gene 
comprising a DNA segment that contains a polyadenylation signal and any other 
regulatory signals capable of effecting mRNA processing or gene expression. The 
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polyadenylation signal is usually characterized by effecting the addition of polyadenylic 
acid tracks to the 3' end of the mRNA precursor. Polyadenylation signals are 
commonly recognized by the presence of homology to the canonical form 5' AATAAA- 
3' although variations are not uncommon. 

Examples of suitable 3' regions are the 3' transcribed non-translated regions 
containing a polyadenylation signal of Agrobacterium tumour inducing (Ti) plasmid 
genes, such as the nopaline synthase (Nos gene) and plant genes such as the soybean 
storage protein genes and the small subunit of the ribulose-1, 5-bisphosphate 
carboxylase (ssRUBISCO) gene. The 3' untranslated region from the structural gene 
of the present construct can therefore be used to construct chimeric genes for 
expression in plants. 

The chimeric gene construct of the present invention can also include further 
enhancers, either translation or transcription enhancers, as may be required. These 
enhancer regions are well known to persons skilled in the art, and can include the ATG 
initiation codon and adjacent sequences. The initiation codon must be in phase with 
the reading frame of the coding sequence to ensure translation of the entire sequence. 
The translation control signals and initiation codons can be from a variety of origins, 
both natural and synthetic. Translational initiation regions may be provided from the 
source of the transcriptional initiation region, or from the structural gene. The 
sequence can also be derived from the promoter selected to express the gene, and can 
be specifically modified so as to increase translation of the mRNA. 
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To aid in identification of transformed plant cells, the constructs of this 
invention may be further manipulated to include plant selectable markers. Useful 
selectable markers include enzymes which provide for resistance to an antibiotic such 
as gentamycin, hygromycin, kanamycin, and the like. Similarly, enzymes providing 
for production of a compound identifiable by colour change such as GUS 
(p-glucuronidase), or luminescence, such as lucif erase are useful. 

Also considered part of this invention are transgenic plants containing the 
chimeric gene construct of the present invention. Methods of regenerating whole 
plants from plant cells are known in the art, and the method of obtaining transformed 
and regenerated plants is not critical to this invention. In general, transformed plant 
cells are cultured in an appropriate medium, which may contain selective agents such 
as antibiotics, where selectable markers are used to facilitate identification of 
transformed plant cells. Once callus forms, shoot formation can be encouraged by 
employing the appropriate plant hormones in accordance with known methods and the 
shoots transferred to rooting medium for regeneration of plants. The plants may then 
be used to establish repetitive generations, either from seeds or using vegetative 
propagation techniques . 

The constructs of the present invention can be introduced into plant cells using 
Ti plasmids, Ri plasmids, plant virus vectors, direct DNA transformation, micro- 
injection, electroporation, etc. For reviews of such techniques see for example 
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Weissbach and Weissbach (1988) and Geierson and Corey (1988). The present 
invention further includes a suitable vector comprising the chimeric gene construct. 

Buttery and Buzzell (1968) showed that the amount of peroxidase activity 
present in seed coats may vary substantially among different cultivars. The presence 
of a single dominant gene Ep causes a high seed coat peroxidase phenotype (Buzzell 
and Buttery, 1969). Homozygous recessive epep plants are ~ 100-fold lower in seed 
coat peroxidase activity. This results from a reduction in the amount of peroxidase 
enzyme present, primarily in the hourglass cells of the subepidermis (Gijzen et al. , 
1993). In plants carrying the Ep gene, peroxidase is heavily concentrated in the 
hourglass cells (osteosclereids). These cells form a highly differentiated cell layer with 
thick, elongated secondary walls and large intercellular spaces (Baker et al. , 1987). 

Screening a seed coat cDNA library prepared from EpEp plants with a 
degenerate primer derived from the active site domain of plant peroxidase resulted in 
a high frequency of positive clones. Many of these clones encode identical cDNA 
molecules and indicate that the corresponding mRNA is an abundant transcript in 
developing seed coat tissues. The sequence of the cDNA is shown in Figure 1. 

Previous studies on soybean seed coat peroxidase indicated that this enzyme is 
heavily glycosylated and that carbohydrate contributes 18% of the mass of the apo- 
enzyme (Gray et al. , 1996). The seven potential glycosylation sites identified from the 
amino acid sequence of the seed cost peroxidase (Figure 1) would accommodate the 
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five or six N-linked glycosylation sites proposed by Gray et al (1996). The heme- 
binding domain encompasses residues Aspl61 to Phel71 and the acid-base catalysis 
region from Gly33 to Cys44. The two regions are highly conserved among plant 
peroxidases and are centred around functional histidine residues, His 169 and His40. 
There are eight conserved cysteine residues in the mature protein that provide for four 
di-sulfide bridges found in other plant peroxidases and predicted from the crystal 
structure of peanut peroxidase (Welinder, 1992; Schuller et al, 1996). Other 
conserved areas include residues Cys91 to Alal05 and Vail 19 to Leul27 that occur in 
or around helix D. The most divergent aspects of the seed coat peroxidase protein 
sequence are the carboxy- and ammo-terminal regions. These sequences probably 
provide special targeting signals for the proper processing and delivery of the peptide 
chain. It is possible the carboxy-terminal extension of the seed coat peroxidase is 
removed at maturity, as has been shown for certain barley and horseradish peroxidases 
(Welinder, 1992). 

The molecular mass of the enzyme has been determined by denaturing gel 
electrophoresis to be 37 kDa (Sessa and Anderson, 1981; Gillikin and Graham, 1991) 
or 43 kDa (Gijzen et a/., 1993). Analysis by mass spectrometry indicated a mass of 
40,622 Da for the apo-enzyme and 33,250 Da after deglycosylation (Gray et al, 
1996). These values are in good agreement with the mass of 35,377 Da calculated from 
the predicted amino acid sequence for the mature apo-protein prior to glycosylation and 
other modifications. Huangpu et al (1995) reported an anionic seed coat peroxidase 
having a M r of 30,577 Da and characterized a partial cDNA encoding this protein. 


1 

- 19- 

This 1031 bp cDNA contained an open reading frame of 849 bp encoding a 283 amino 
acid protein. There are several differences between this reported sequence and the 
sequence of this invention that are manifest at the amino acid level (see Figure 3 for 
sequence comparison). The enzyme encoded by the gene reported by Huangpu et al 
is different from that of this invention as the peroxidase of this invention has a M r of 
5 35,377 Da. 

Genomic DN A blots probed with the seed coat peroxidase cDNA produced two 
or three hybridizing fragments of varying intensity with most restriction enzyme 
;2 digestions, despite that several peroxidase isozymes are present in soybean. The results 

p 10 indicate that this seed coat peroxidase is present as a single gene that does not share 

s sufficient homology with most other peroxidase genes to anneal under conditions of 

rj high stringency. 

The genomic DNA sequence comprises four exons spanning bp 1533-1752 
15 (exon I), 2383 -2574 (exon 2), 3605-3769 (exon 3) and 4033-4516 (exon 4) and three 
introns comprising 1752-2382 (intron 1), 2575-3604 (intron 2) and 3770-4516 (intron 
3), of SEQ ID NO:2. Features of the upstream regulatory region of the genomic DNA 
include a TATA box centred on bp 1487; a cap signal 32 bp down stream centred on 
bp 1520. Also noted within the genomic sequence are three polyadenylation signals 
20 centred on bp 4520, 4598, 4663 and a polyadenylation site at bp 4700. 
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This promoter is considered seed coat specific since the peroxidase protein 
encoded by the Ep gene accumulates in the seed coat tissues, especially in the hourglass 
cells of the subepidermis, and is not expressed in other tissues, aside from a marginal 
expression of peroxidase in the root tissues. This is also true at the transcriptional 
level (see Figure 9). The DNA regulatory regions of the genomic sequence of Figure 
2 are used to control the expression of the adjacent peroxidase gene in seed coat tissue. 
Such regulatory regions include nucleotides 1-1532. Other regions of interest include 
nucleotides 1752-2382, 2575-3604 and/or 3770-4032 of SEQ ID NO:2. Therefore 
other proteins of interest may be expressed in seed coat tissues by placing a gene 
capable of expressing the protein of interest under the control of the DNA regulatory 
elements of this invention. Genes of interest include but are not restricted to herbicide 
resistant genes, genes encoding viral coat proteins, or genes encoding proteins 
conferring biological control of pest or pathogens such as an insecticidal protein for 
example B. thuringiensis toxin. Other genes include those capable of the production 
of proteins that alter the taste of the seed and/or that affect the nutritive value of the 
soybean. 

A modified DNA regulatory sequence may be obtained by introducing changes 
into the natural sequence. Such modifications can be done through techniques known 
to one of skill in the art such as site-directed mutagenesis, reducing the length of the 
regulatory region using endonucleases or exonucleases, increasing the length through 
the insertion of linkers or other sequences of interest. Reducing the size of DNA 
regulatory region may be achieved by removing 3 1 or 5' regions of the regulatory 
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region of the natural sequence by using a endonuclease such as BAL 3 1 (Sambrook et 
al 1989). However, any such DNA regulatory region must still function as a seed coat 
specific DNA regulatory region. 

It may be readily determined if such modified DNA regulatory elements are 
capable of acting in a seed coat specific manner transforming plant cells with such 
regulatory elements controlling the expression of a suitable marker gene, culturing 
these plants and determining the expression of the marker gene within the seed coat as 
outlined above. One may also analyze the efficacy of DNA regulatory elements by 
introducing constructs comprising a DNA regulatory element of interest operably 
linked with an appropriate marker into seed coat tissues by using particle bombardment 
directed to seed coat tissue and detemiining the degree of expression of the regulatory 
region as is known to one of skill in the art. 

Two tandemly arranged genes encoding anionic peroxidase expressed in stems 
of Populus kitakamiensis 9 prxASa and prxA4a have been cloned and characterized 
(Osakabe et al, 1995). Both of these genomic sequences contained four exons and 
three introns and encoded proteins of 347 and 343 amino acids, respectively. The two 
genes encode distinct isozymes with deduced M r s of 33.9 and 34.6 kDa. 
Furthermore, a 532 bp promoter derived from the peroxidase gene of Armoracia 
rusticana has also been reported (Toyobo KK, JP 4,126,088, April 27, 1992). 
However, a search using GenBank revealed no substantial similarity between the 
promoter region, or introns 1, 2 and 3 of this invention and those within the literature. 
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Digestion of the genomic DNA with BamHI or Sad revealed restriction 
fragment length polymorphisms that distinguished EpEp and epep genotypes. Although 
the Xbal digestion did not produce a readily detectable polymorphism, the size of the 
hybridizing fragment in both genotypes was -14 kb. Thus, a 0.3 kb size difference is 
outside of the resolving power of the separation for fragments this large. Sequence 
analysis of EpEp and epep genotypes indicates that the mutant ep allele is missing 87 
bp of sequence at the 5 ? end of the structural gene. This would account for the 
drastically reduced amounts of peroxidase enzyme present in seed coats of epep plants 
since the deletion includes the translation start codon and the entire N-terminal signal 
sequence. However, the 87 bp deletion cannot account for the differences observed in 
the RFLP analysis since the missing fragment does not include a BamHI site and is 
much smaller than the 0.3 kb polymorphism detected in the Sacl digestion. Thus, 
other genetic rearrangements must occur in the vicinity of the ep locus that lead to 
these polymorphisms. 

The results shown here indicate that the mutation causing low seed coat 
peroxidase activity occurs in the structural gene encoding the enzyme. This mutation 
is an 87 bp deletion in the 5 ' region of the gene encompassing the translation start site. 
Several different low peroxidase cultivars share a similar mutation in the same area, 
suggesting that the recessive ep alleles have a common origin or that the region is 
prone to spontaneous deletions or rearrangements. 


-23- 

Due to the industrial interest in soybean seed coat peroxidase, alternate sources 
for the production of this enzyme are needed. The DNA of this invention, encoding 
the seed coat soybean peroxidase under the control of a suitable promoter and 
expressed within a host of interest, can be used for the preparation of recombinant 
soybean seed coat peroxidase enzyme. 

Soybean seed coat peroxidase has been characterized as a lignin-type peroxidase 
that has industrially significant properties ie: high activity and stability under acidic 
conditions; exhibits wide substrate specificity; equivalent catalytic properties to that 
of Phanerochaete chrysosporium ligin peroxidase (the currently preferred enzyme used 
for treatment of industrial waste waters (Wick 1995) but is at least 150-fold more 
stable; more stable than horseradish peroxidase which is also used in industrial effluent 
treatments and medical diagnostic kits (McEldoon et al. , 1995). These properties are 
useful within industrial applications for the degradation of natural aromatic polymers 
including lignin and coal (McEldoon et al, 1995), and the preferred use of soybean 
peroxidase, over that of horseradish peroxidase, in medical diagnostic tests as an 
enzyme label for antigens, antibodies, oligonucleotide probes, and within staining 
techniques (Wick 1995). Soybean peroxidase is also used in the deinking of printed 
waste paper (Johnson et aL, U.S. 5,270,770; December 6, 1994) and for the 
biocatalytic oxidation of primary alcohols (Johnson et aL, U.S. 5,391,488; February 
13, 1996). Soybean peroxidase has also been used as a replacement for chlorine in 
the pulp and paper industry, in order to remove chlorine, phenolic or aromatic amine 
containing pollutants from industrial waste waters (Wick 1995), or as formaldehyde 
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replacement (Freiberg, 1995) for use in adhesives, abrasives, and protective coatings 
(e.g. varnish and resins, Wick 1995). 

Furthermore, the seed coat peroxidase gene may be expressed in an organ or 
tissue specific manner within a plant. For example, the quality and strength of cotton 
fibber can be improved through the over-expression of cotton or horseradish peroxidase 
placed under the control of a fibre-specific promoter (Maliyakal, WO 95/08914; April 
6, 1995). 

Similarly, seed-specific DNA regulatory regions of this invention may be used 
to control expression of genes of interest such as: 

i) genes encoding herbicide resistance, or 

ii) biological control of insects or pathogens (e.g. B. thuringiensis), or 

iii) viral coat proteins to protect against viral infections , or 

iv) proteins of commercial interest (e.g. pharmaceutical), and 

v) proteins that alter the nutritive value, taste, or processing of seeds 
within the seed coat of plants. 

While this invention is described in detail with particular reference to preferred 
embodiments thereof, said embodiments are offered to illustrate but not to limit the 
invention. 


EXAMPLES 
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Plant material 

All soybean (Glycine max [L.] Merr) cultivars and breeding lines were from the 
collection at Agriculture Canada, Harrow, Ontario. 

Seed Coat cDNA library Construction and Screening 

High seed coat peroxidase (EpEp) soybean cultivar Harosoy 63 plants were 
grown in field plots outdoors. Pods were harvested 35 days after flowering and seeds 
in the mid-to-late developmental stage were excised. The average fresh mass was 250 
mg per seed. Seed coats were dissected and immediately frozen in liquid nitrogen. The 
frozen tissue was lyophilized and total RNA extracted in 100 mM Tris-HCl pH 9.0, 
20 mM EDTA, 4% (w/v) sarkosyl, 200 mM NaCl, and 16 mM DTT, and precipitated 
with LiCl using the standard phenol/chloroform method described by Wang and 
Vodkin (1994). The poly(A) + RNA was purified on oligo(dT) cellulose columns prior 
to cDNA synthesis, size selection, ligation into the X ZAP Express vector, and 
packaging according to instructions (Stratagene). A degenerate oligonucleotide with the 
5' to 3' sequence of TT(C/T)CA(C/T)GA(C/T)TG(C/T)TT(C/T)GT was 5 1 end 
labelled to high specific activity and used as a probe to isolate peroxidase cDNA clones 
(Sambrook et al. , 1989). Duplicate plaque lifts were made to nylon filters (Amersham), 
UV fixed, and prehybridized at 36 °C for 3 h in 6 x SSC, 20 mM Na 2 HPQ 4 (pH6.8), 
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5 x Denhardt's, 0.4 % SDS, and 500 /zg/mL salmon sperm DNA. Hybridization was 
in the same buffer, without Denhardt's, at 36 °C for 16 h. Filters were washed quickly 
with several changes of 6 x SSC and 0.1 % SDS, first at room temperature and finally 
at 40 °C, prior to autoradiography for 16 h at -70 °C with an intensifying screen. 

Genomic DNA Isolation, Library Construction, and DNA Blot Analysis 

Soybean genomic DNA was isolated from leaves of greenhouse grown plants 
or from etiolated seedlings grown in vermiculite. Plant tissue was frozen in liquid 
nitrogen and lyophilized before extraction and purification of DNA according to the 
method of Dellaporta et al (1983). Restriction enzyme digestion of 30 fxg DNA, 
separation on 0.5 % agarose gels and blotting to nylon membranes followed standard 
protocols (Sambrook et aL, 1989). For construction of the genomic library, DNA 
purified from Harosoy 63 leaf tissue was partially digested with BamHL and ligated into 
the X FIX II vector (Stratagene). Gigapack XL packaging extract (Stratagene) was used 
to select for inserts of 9 to 22 kb. After library amplification, duplicate plaque lifts 
were hybridized to cDNA probe. 

Blots or filter lifts were prehybridized for 2 h at 65° C in 6 x SSC, 5 x 
Denhardt's, 0.5 % SDS, and 100 pig/mL salmon sperm DNA. Radiolabeled cDNA 
probe (20 to 50 ng) was prepared using the Ready-to-Go labelling kit (Pharmacia) and 
32 P-dCTP (Amersham). Unincorporated* 2 P-dCTP was removed by spin column 
chromatography before adding radiolabeled cDNA to the hybridization buffer 
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(identical to prehybridization buffer without Denhardt's). Hybridization was for 20 h 
at 65 °C. Membranes were washed twice for 15 min at room temperature with 2 x SSC, 
0.5 % SDS, followed by two 30 min washes at 65°C with 0.1 x SSC, 0.5 % SDS. 
Autoradiography was for 20 h at -70 °C using an intensifying screen and X-OMAT film 
(Kodak). 

DNA Sequencing 

Sequencing of DNA was performed using dye-labelled terminators and Taq-FS 
DNA polymerase (Perkin-Elmer) . The PCR protocol consisted of 25 cycles of a 30 sec 
melt at 96°C, 15 sec annealing at 50 D C, and 4 min extension at 60°C. Samples were 
analyzed on an Applied Biosy stems 373 A Stretch automated DNA sequencer. 

Polymerase Chain Reaction 

PCR amplifications contained 1 ng template DNA, 5 pmol each primer, 1.5 
mM MgCl 2? 0.15 mM deoxynucleotide triphosphates mix, 10 mM Tris-HCl, 50 mM 
KC1, pH 8.3, and 1 unit of Taq polymerase (Gibco BRL) in a total volume of 25 /uL. 
Reactions were performed in a Perkin-Elmer 480 thermal cycler. After an initial 2 min 
denaturation at 94 °C, there were 35 cycles of 1 min denaturation at 94 °C, 1 min 
annealing at 52 °C, and 2 min extension at 72 °C. A final 7 min extension at 72 °C 
completed the program. The following primers were used for PCR analysis of genomic 
DNA: 
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prx2 + 


CTTCCAAATATCAACTCAAT 


prx6- 


TAAAGTTGGAAAAGAAAGTA 


prx9 


ATGCATGCAGGTTTTTCAGT 


prxlO- 


TTGCTCGCTTTCTATTGTAT 


prxl2 + 


TCTTCGATGCTTCTTTCACC 


prx29 + 


CATAAACAATACGTACGTGAT 


RNA Isolation 


For isolation of RNA, tissue was harvested from greenhouse grown plants, 
dissected, frozen in liquid nitrogen, and lyophilized prior to extraction. Total RNA was 
purified from seed coats, embryos, pods, leaves, and flowers using standard 
phenol/chloroform method (Sambrook et al., 1989). This method did not afford good 
yields of RNA from roots, therefore this tissue was extracted with Triazole reagent 
(GibcoBRL) and total RNA purified according to manufacturers ' instructions with an 
additional phenol-chloroform extraction step. The amount of RNA was estimated by 
measuring absorbance at 260 and 280 nm, and by electrophoretic separation in 
formaldehyde gels followed by staining with ethidium bromide and comparison to 
known standards. Total RNA (10 //g per sample) was prepared, subject to 
electrophoresis through a 1 % agarose gel containing formaldehyde, and then stained 
with ethidium bromide to ensure equal loading of samples. The gel was blotted to 
nylon (Hybond™N, Amersham) according to standard methods and the RNA was fixed 
to the membrane by UV cross linking. 
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Seed Coat Peroxidase Assays 

The F 3 seed was measured for peroxidase activity to score the phenotype of the 
F 2 population because the seed testa is derived from maternal tissue. The seeds were 
briefly soaked in water and the seed coat was dissected from the embryo and placed in 
a vial. Ten drops (-500 /xh) of 0.5% guaiacol was added and the sample was left to 
stand for 10 min before adding one drop (-50 fiL) of 0.1% H 2 0 2 . An immediate 
change in colour of the solution, from clear to red, indicates a positive result and high 
seed coat peroxidase activity. 

Example 1: The Seed Coat Peroxidase cDNA and genomic DNA sequences 

To isolate the seed coat peroxidase transcript, a cDNA library was constructed 
from developing seed coat tissue of the EpEp cultivar Harosoy 63. The primary 
library contained 10 6 recombinant plaque forming units and was amplified prior to 
screening. A degenerate 17-mer oligonucleotide corresponding to the conserved active 
site domain of plant peroxidases was used to probe the library. In screening 10,000 
plaque forming units, 12 positive clones were identified. The cDNA insert size of the 
clones ranged from 0.5 to 2.5 kb, but six clones shared a common insert size of 1.3 
kb. These six clones (soyprx03, soyprx05, soyprx06, soyprxll, soyprxU, and 
soyprxl4) were chosen for further characterization since the 1.3 kb insert size matched 
the expected peroxidase transcript size. Sequence analysis of the six clones showed that 
they contained identical cDNA transcripts encoding a peroxidase and that each resulted 
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from an independent cloning event since the junction between the cloning vector and 
the transcript was different in all cases. 

Since it was not clear that the entire 5' end of the cDNA transcript was 
complete in any of the cDNA clones isolated, the structural gene corresponding to the 
seed coat peroxidase was isolated from a Harosoy 63 genomic library. A partial BamBl 
digest of genomic DNA was used to construct the library and more than 10 6 plaque 
forming units were screened using the cDNA probe. A positive clone, G25-2-1-1-1, 
containing a 17 kb insert was identified and a 4.7 kb region encoding the peroxidase 
was sequenced SEQ ID NO:2. This region includes 1532 nucleotides of the 5* region 
of the peroxidase gene. 

The genomic sequence matched the cDNA sequence except for three introns 
encoded within the gene. The genomic sequence also revealed two additional 
translation start codons, beginning one bp and 10 bp upstream from the 5 1 end of the 
longest cDNA transcript isolated. Figure 1 shows the deduced cDNA sequence. The 
open reading frame of 1056 bp encodes a 352 amino acid protein of 38,106 Da. A 
heme-binding domain, a peroxidase active site signature sequence, and seven potential 
N-glycosylation sites were identified from the deduced amino acid sequence. The first 
26 amino acid residues conform to a membrane spanning domain. Cleavage of this 
putative signal sequence releases a mature protein of 326 residues with a mass of 
35,377 Da and an estimated pi of 4.4. 
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Relevant features of the genomic fragment (Figure 2) include four exons at bp 
192-411 (exon 1; 1533-1751 of SEQ ID NO:2), 1042 -1233 (exon 2; 2383-2574 of 
SEQ ID NO:2), 2263-2429 (exon 3; 4033-4516 fo SEQ ID NO:2) and 2692-3174 
(exon 4; 1752-2382 of SEQ ID NO:2) and three introns at bp 412-1041 (intron 1; 
1752-2382 of SEQ ID NO:2), 1234-2263 (intron 2; 2575-3604 of SEQ ID NO:2) and 
2430-2691 (intron 3; 3770-4032 of SEQ ID NO:2). The 1532 bp regulatory region of 
the genomic DNA include a TATA box centred on bp 1487 and a cap signal 32 bp 
down stream centred at bp 1520 of SEQ ID NO:2. Also noted within the genomic 
sequence are three polyadenylation signals centred on bp 4520, 4598, 4700 and a 
polyadenylation site at bp 4700 of SEQ ID NO:2. 

Figure 3 illustrates the relationship between the soybean seed coat peroxidase 
and other selected plant peroxidases. The soybean sequence is most closely related to 
four peroxidase cDNAs isolated from alfalfa, (see Figure 3) sharing from 65 to 67% 
identity at the amino acid level with the alfalfa proteins (X90693, X90694, X90692, 
el-Turk et al 1996; L36156, Abrahams et al 1994). When compared with other plant 
peroxidases, soybean seed coat peroxidase exhibits from 60 to 65% identity with 
poplar (D30653 and D30652, Osakabe et al 1994)) and flax (L0554, Omann and Tyson 
1995); 50 to 60% identity with horseradish (M37156, Fujiyama et al. 1988), tobacco 
(D 11396, Osakabe et al 1993), and cucumber (M91373, Rasmussen et al. 1992); and 
49% identity with barley (L36093, Scott-Craig et al. 1994), wheat (X85228, Baga et 
al 1995) and tobacco (L02124, Diaz-De-Leon et al 1993) peroxidases. 
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A comparison of the promoter region, 1-1532 of SEQ ID NO:2, indicates that 
there are no similar sequences present within the GENBANK database. 

Example 2: DNA Blot Analysis Using the Seed Coat Peroxidase cDNA Probe 

Reveals Restriction Fragment Length Polymorphisms Between EpEp and epep 
Genotypes 

Genomic DNA blots of OX347 (EpEp) and OX312 (epep) plants were 
hybridized with 32 P-labelled cDNA to estimate the copy number of the seed coat 
peroxidase gene and to determine if this locus is polymorphic between the two 
genotypes. Figure 4 shows the hybridization patterns after digestion with BamHI, Xbal, 
and Sad. Restriction fragment length polymorphisms are clearly visible in the BamHI 
and Sad digestions. The BamHI digestion produced a strongly hybridizing 17 kb 
fragment and a faint 3.4 kb fragment in the EpEp genotype. The 3.4 kb BamHI 
fragment is visible in the epep genotype but the 17 kb fragment has been replaced by 
a signal at > 20 kb. The Sad digestion resulted in detection of three fragments in EpEp 
and epep plants. At least two fragments were expected here since the cDNA sequence 
has a Sad site within the open reading frame. However, the smallest and most strongly 
hybridizing of these fragments is 5.2 kb in EpEp plants and 4.9 kb in epep plants. 
Digestion with Xbal produced hybridizing fragments of -14 kb and 7.8 kb for both 
genotypes, with the larger fragment showing a stronger signal. 


1 
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Example 3: A Deletion Mutation Occurs in the Recessive ep Locus 

The structural gene encoding the seed coat peroxidase is schematically 
illustrated in Figure 5. The 17 kb BamHI fragment encompassing the gene includes 191 
bp of sequence upstream from the translation start codon, three introns of 631 bp, 1030 
5 bp, and 263 bp, and 13 kb of sequence downstream from the polyadenylation site. The 
arrangement of four exons and three introns and the placement of introns within the 
sequence is similar to that described for other plant peroxidases (Simon, 1992; Osakabe 
etal 1995). 

1 0 Primers were designed from the DNA sequence to compare EpEp and epep 

genotypes by PCR analysis. Figure 6 shows PCR amplification products from four 
different primer combinations using OX312 {epep) and OX347 (EpEp) genomic DNA 
as template. The primer annealing site for prx29+ begins 182 bp upstream from the 
ATG start codon; the remaining primer sites are shown in Figure 1 . Amplification with 

15 primers prx2+ and prx6-, and with prxl2+ and prxlO- produced the expected 
products of 1.9 kb and 860 bp, respectively, regardless of the Eplep genotype of the 
template DNA. However, PCR amplification with primers prx9+ and prxlO-, and with 
prx29+ and prxlO- generated the expected products only when template DNA was 
from plants carrying the dominant Ep allele. When template DNA was from an epep 

20 genotype, no product was detected using primers prx9 + and prxlO- and a smaller 
product was amplified with primers prx29+ and prxlO-. The products resulting from 
amplification of OX312 or OX347 template DNA with primers prx29 + and prxlO- 
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were directly sequenced and compared. The polymorphism is due to an 87 bp deletion 
occurring within this DNA fragment in OX312 plants, as shown in Figure 5. This 
deletion begins nine bp upstream from the translation start codon and includes 78 bp 
of sequence at the 5' end of the open reading frame, including the prx9+ primer 
annealing site. 

To test whether this deletion mutation cosegregates with the seed coat 
peroxidase phenotype, genomic DNA from an F 2 population segregating at the Ep locus 
was amplified using primers prx9 + and prxlO- and F 3 seed was tested for seed coat 
peroxidase activity. Figure 7 shows the results from this analysis. Of the 30 F 2 
individuals tested, all 23 that were high in seed coat peroxidase activity produced the 
expected 860 bp PCR amplification product. The remaining seven F 2 T s with low seed 
coat peroxidase activity produced no detectable PCR amplification products. 

Finally, to determine if the OX3\2(epep) and OX347 (EpEp) breeding lines are 
representative of soybean cultivars that differ in seed coat peroxidase activity, several 
cultivars were tested by PCR analysis using primer combinations targeted to the Ep 
locus. Figure 8 shows results from this analysis of six different soybean cultivars, three 
each of the homozygous dominant EpEp and recessive epep genotypes. As observed 
with OX312 and OX347, amplification products of the expected size were produced 
with primers prxl2+ and prxlO- regardless of the genotype, whereas epep genotypes 
yielded no product with primers prx9+ and prxlO- or a smaller fragment with primers 
prx29+ andprxlO-. 
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Example 4 Developmental Pattern of Expression of the Ep gene 

The seed coat peroxidase mRNA levels were determined by hybridizing RNA gel blots 
with radio labelled cDNA probe. The figure illustrates the transcript abundance in 
various tissues of epep and EpEp plants. The mRNA accumulated to high levels in seed 
coat tissues of EpEp plants, especially in the later stages development when whole seed 
fresh weight exceeded 50 mg. Low levels of transcript could also be detected in root 
tissues but not in the flower, embryo, pod or leaf. The transcript could also be detected 
in seed coat and root tissues epep plants but in drastically reduced amounts compared 
to the EpEp genotype. The reduced amounts of peroxidase mRNA present in seed coats 
of epep plants indicates that the transcriptional process and/or the stability of the 
resulting mRNA is severely affected. The Ep gene has a TATA box and a 5' cap signal 
beginning 47 bp and 15 bp, respectively, upstream from the translation start codon. 
The 87 bp deletion in the ep allele extends into the 5 ' cap signal and therefore could 
interfere with transcript processing. Regardless, any resulting transcript will not be 
properly translated since the AUG initiation codon and the entire amino-terminal 
signal sequence is deleted from the ep allele. Not wishing to be bound by theory, the 
lack of peroxidase accumulation in seed coats of epep plants appears to be due to at 
least two factors, greatly reduced transcript levels and ineffective translation, resulting 
from mutation of the structural gene encoding the enzyme. In summary, the results 
indicate that the Ep gene regulatory elements can drive high level expression in a 
tightly coordinated, tissue and developmentally specific manner. 
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All scientific publications and patent documents are incorporated herein by 
reference. 

The present invention has been described with regard to preferred 
embodiments. However, it will be obvious to persons skilled in the art that a number 
of variations and modifications can be made without departing from the scope of the 
invention as described in the following claims. 
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SEQUENCE LISTING 


(1) GENERAL INFORMATION: 

(i) APPLICANT: GIJZEN, Mark 

(ii) TITLE OF INVENTION: SEED COAT SPECIFIC DNA REGULATORY REGION 

AND PEROXIDASE 

(iii) NUMBER OF SEQUENCES: 2 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: NIXON & VANDERHYE P.C. 

(B) STREET: 8th Floor, 1100 North Glebe Road 

(C) CITY: Arlington 
{ D) STATE: Virginia 

(E) COUNTRY: United States 

(F) ZIP: 22201-4714 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 26-SEP-1997 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 08/723,414 

(B) FILING DATE: 30-SEP-1996 

(viii) ATTORNEY/AGENT INFORMATION : 

(A) NAME: BYRNE, Thomas E. 

(B) REGISTRATION NUMBER: 32,205 

(C) REFERENCE/DOCKET NUMBER: 76-105 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (703) 816-4021 

(B) TELEFAX: (703) 816-4100 


(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1244 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: cDNA 


(iii) HYPOTHETICAL: NO 


(iv) ANTI- SENSE: NO 


{ ix) FEATURE : 

(A) NAME / KEY : CDS 

(B) LOCATION : 1 . .1056 

( ix) FEATURE : 

(A) NAME /KEY : sig_peptide 

(B) LOCATION : 1 . .77 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1: 


ATG GGT TCC ATG CGT CTA TTA GTA GTG GCA TTG TTG TGT GCA TTT GCT 
Met Gly Ser Met Arg Leu Leu Val Val Ala Leu Leu Cys Ala Phe Ala 
i 5 10 15 


ATG CAT GCA GGT TTT TCA GTC TCT TAT GCT CAG CTT ACT CCT ACG TTC 
Met His Ala Gly Phe Ser Val Ser Tyr Ala Gin Leu Thr Pro Thr Phe 

20 25 30 


TAC AGA GAA ACA TGT CCA AAT CTG TTC CCT ATT GTG TTT GGA GTA ATC 
Tyr Arg Glu Thr Cys Pro Asn Leu Phe Pro He Val Phe Gly Val He 
35 40 45 


TTC GAT GCT TCT TTC ACC GAT CCC CGA ATC GGG GCC AGT CTC ATG AGG 


I 
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Phe Asp Ala Ser Phe Thr Asp Pro Arg He Gly Ala Ser Leu Met Arg 

50 - 55 60 


CTT CAT TTT CAT GAT TGC TTT GTT CAA GGT TGT GAT GGA TCA GTT TTG 
Leu His Phe His Asp Cys Phe Val Gin Gly Cys Asp Gly Ser Val Leu 
65 70 75 80 


CTG AAC AAC ACT GAT ACA ATA GAA AGC GAG CAA GAT GCA CTT CCA AAT 
Leu Asn Asn Thr Asp Thr He Glu Ser Glu Gin Asp Ala Leu Pro Asn 

85 90 95 


ATC AAC TCA ATA AGA GGA TTG GAC GTT GTC AAT GAC ATC AAG ACA GCG 
He Asn Ser He Arg Gly Leu Asp Val Val Asn Asp He Lys Thr Ala 

100 105 110 


GTG GAA AAT AGT TGT CCA GAC ACA GTT TCT TGT GCT GAT ATT CTT GCT 
Val Glu Asn Ser Cys Pro Asp Thr Val Ser Cys Ala Asp He Leu Ala 
115 120 125 


ATT GCA GCT GAA ATA GCT TCT GTT CTG GGA GGA GGT CCA GGA TGG CCA 
He Ala Ala Glu He Ala Ser Val Leu Gly Gly Gly Pro Gly Trp Pro 
13 0 13 5 14 0 


GTT CCA TTA GGA AGA AGG GAC AGC TTA ACA GCA AAC CGA ACC CTT GCA 
Val Pro Leu Gly Arg Arg Asp Ser Leu Thr Ala Asn Arg Thr Leu Ala 
145 150 155 160 


AAT CAA AAC CTT CCA GCA CCT TTC TTC AAC CTC ACT CAA CTT AAA GCT 
Asn Gin Asn Leu Pro Ala Pro Phe Phe Asn Leu Thr Gin Leu Lys Ala 

165 170 175 
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TCC TTT GCT GTT CAA GGT CTC AAC ACC CTT GAT TTA GTT ACA CTC TCA 

Ser Phe Ala Val Gin Gly Leu Asn Thr Leu Asp Leu Val Thr Leu Ser 

180 185 190 


GGT GGT CAT ACG TTT GGA AGA GCT CGG TGC AGT ACA TTC ATA AAC CGA 
Gly Gly His Thr Phe Gly Arg Ala Arg Cys Ser Thr Phe He Asn Arg 
195 200 205 


TTA TAC AAC TTC AGC AAC ACT GGA AAC CCT GAT CCA ACT CTG AAC ACA 
Leu Tyr Asn Phe Ser Asn Thr Gly Asn Pro Asp Pro Thr Leu Asn Thr 
210 215 220 


ACA TAC TTA GAA GTA TTG CGT GCA AGA TGC CCC CAG AAT GCA ACT GGG 
Thr Tyr Leu Glu Val Leu Arg Ala Arg Cys Pro Gin Asn Ala Thr Gly 
225 230 235 240 


GAT AAC CTC ACC AAT TTG GAC CTG AGC ACA CCT GAT CAA TTT GAC AAC 
Asp Asn Leu Thr Asn Leu Asp Leu Ser Thr Pro Asp Gin Phe Asp Asn 

245 250 255 


AGA TAC TAC TCC AAT CTT CTG CAG CTC AAT GGC TTA CTT CAG AGT GAC 
Arg Tyr Tyr Ser Asn Leu Leu Gin Leu Asn Gly Leu Leu Gin Ser Asp 

260 265 270 


CAA GAA CTT TTC TCC ACT CCT GGT GCT GAT ACC ATT CCC ATT GTC AAT 
Gin Glu Leu Phe Ser Thr Pro Gly Ala Asp Thr He Pro He Val Asn 
275 280 285 


AGC TTC AGC AGT AAC CAG AAT ACT TTC TTT TCC AAC TTT AGA GTT TCA 
Ser Phe Ser Ser Asn Gin Asn Thr Phe Phe Ser Asn Phe Arg Val Ser 
^ 290 295 300 


» 


-45- 

ATG ATA AAA ATG GGT AAT ATT GGA GTG CTG ACT GGG GAT GAA GGA GAA 96 0 

Met lie Lys Met Gly Asn lie Gly Val Leu Thr Gly Asp Glu Gly Glu 
305 310 315 320 


ATT CGC TTG CAA TGT AAT TTT GTG AAT GGA GAC TCG TTT GGA TTA GCT 10 0 8 

lie Arg Leu Gin Cys Asn Phe Val Asn Gly Asp Ser Phe Gly Leu Ala 

325 330 335 


AGT GTG GCG TCC AAA GAT GCT AAA CAA AAG CTT GTT GCT CAA TCT AAA 1056 
Ser Val Ala Ser Lys Asp Ala Lys Gin Lys Leu Val Ala Gin Ser Lys 

340 345 350 


TAAACCAATA AT TAATGGGG ATGTGCATGC TAGCTAGCAT GTAAAGG CAA ATTAGGTTGT 1116 


AAACCTCTTT G CTAGCTAT A TTGAAATAAA CCAAAGGAGT AGTGTGCATG TCAATTCGAT 117 6 


TTTGCCATGT ACCTCTTGGA ATATTATGTA ATAATTATTT GAATCTCTTT AAGG TACTTA 123 6 


ATTAATCA 


1244 


(2) INFORMATION FOR SEQ ID NO : 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4700 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 


(ii) MOLECULE TYPE: DNA (genomic) 


1 
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(ix) FEATURE: 

(A) NAME /KEY : promoter 

(B) LOCATION : 1 - .1532 


(ix) FEATURE: 

(A) NAME / KEY : sig_J>eptide 

(B) LOCATION: 1533 . .1609 


(ix) FEATURE: 

(A) NAME /KEY : exon 

(B) LOCATION: 1533 . .1751 


(ix) FEATURE: 

(A) NAME / KEY : exon 

(B) LOCATION:2383 . .2574 


(ix) FEATURE: 

(A) NAME / KEY : exon 

(B) LOCATION: 3605. .3769 


( ix) FEATURE : 

(A) NAME /KEY : exon 

(B) LOCATION:4033 . .4516 


(ix) FEATURE: 

(A) NAME /KEY : intron 

(B) LOCATION : 1752 . .1782 


(ix) FEATURE: 

(A) NAME /KEY : intron 
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(B) LOCATION: 2575. .3604 


(ix) FEATURE: 

(A) NAME /KEY : intron 

(B) LOCATION: 3770. .4032 


(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 1533 . .1751 


(ix) FEATURE: 

(A) NAME / KEY : CDS 

(B) LOCATION: 23 83 . .2574 


(ix) FEATURE : 

(A) NAME /KEY : CDS 

(B) LOCATION: 3 6 05 . .3769 


(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 403 3 . .4516 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 


TAGATAAAAA AATGGGATAT AATTTTTCTC AGATGT TGTT TATACTGTTT T TTTAATCAG 


AATTAAAATT CCTCTTTAAT TATCGACATA ATTTTTTTTG GTGAATATTA TCGACATAAT 


AAATTTTTAT TGTACATAGA AGTGATACTT CAATTTTAAT ATTGGAGAAC 


t 
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AGTACGAAAA CATAAAAAAA CTGTTATTAG AAGAAAAAAA TATATGGAAA AGGTTAGCTA 24 0 

CATATATTAG CTAAATTAGT TGTTCTAATT GGCTATATAA ACCCTATTGT ACTCTTTGTA 3 00 

ATCTCACCTT TTTCATTTAA ATACATTTCT ACTTTTTAAG TTCTATATTT TCTCTCAATT 3 60 

TTCTTCGATA AACCATGAAA TTTAACATGG TATATCAGCG ATACCACCCA CTTTGAAAGC 42 0 

CATGTATGGC TAGTATGGGC AGCCAAAATT TGCCCTGGTT CAAGCAAAGC AAGTGTT TAT 4 80 

ATAGATGTGA CTTTTGTTGA GGAACTCATG CCAATGGTAC TGATTGTGAA ACTGAGAAAA 54 0 

yj CTAATTTGGA GAATTTGAAT TATGATCATT AAATACT C CT CTCCTGACTA CCTTCGTCCC 60 0 

*" r 

y TCAAATTTGT ACCATCATTA TTTCCCAAAA ATTTGATTAC AATGCACTAA TTAATGAATG 66 0 

TTTCTTACAT TAT CATATTA TCATATCTGA CATTTTGTTT TTACTTTTTA TAATAATTAT 72 0 

;$ 5 

'-^-.>t" 

tfl T TTAAAAAGT CAT ACATG CA AATAATTTTT TAATAGT TTA CAGTTAAATT TT TACAGTAA 7 80 

AAATG CATGA AAATTAAACT TTATTTTTCC AAGT CAT CAT TTAGT CAAAT CCCAAAACAA 84 0 

TGAT TATTTT TTGCAAATGA ATGTTTATTG AACAT TTAAA TGTAG CCTAA TTAATTCTGG 90 0 

TTATGGTGTC AATGTTCCAA AACCTAATGC AAGAT CTTAG CAAGTACATA CATAGATCTA 96 0 

ATTTTAAACT TAT CTTTACG CAAGAGATAT AAAGATTATA CAT CTAGTT T TAAACATTAA 102 0 

CTTTTGTTTT TGTGTTAAAA AACAGTAACA TTTTCTTAAT TTTGTAGAGT GACGTGCTCC 10 8 0 

AACCATATTA ACGAAGATTT TAATTGGTAT TCAAGTTCAT GAACTTAGTA AATAAGTTTT 114 0 
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GGTCTTCAGT TTTCAATTTT CATTACAACA TTTATGTAAA ATATCAACGT TTTCTGAAAT 12 0 0 


TTGTTGCTTG TGTGCTCCAA CCACATTTAA GAGATTATAG AAATTAATTT TCAAGAAGAT 12 60 


AATGATTCCT ACTCTTGCTG GCCCTACCAT AG TACAATAA AT C CACT CAT AAATCAACAA 13 2 0 


GTCGTCGTCA TAGGCAATTG GGCATCATAT CATAAACAAT ACGTACGTGA TATTATCTAG 13 80 


TGTCTCTCAG TTTACTTTAT GAGAAATTAT TTTTCTTTAA AAAAAGTTAA TTAATAAAAA 1440 


CATTTGCGAT ACCGTGAGTT ACAAGAAATC CGCCGAATTC ATCTCTATAA ATAAAAGGAT 1500 


CTATATGAGA GGTAAAATCA TATTAACTCA AA ATG GGT TCC ATG CGT CTA TTA 1553 

Met Gly Ser Met Arg Leu Leu 
355 


GTA GTG GCA TTG TTG TGT GCA TTT GCT ATG CAT GCA GGT TTT TCA GTC 1601 
Val Val Ala Leu Leu Cys Ala Phe Ala Met His Ala Gly Phe Ser Val 
360 365 370 375 


TCT TAT GCT CAG CTT ACT CCT ACG TTC TAC AGA GAA ACA TGT CCA AAT 1649 
Ser Tyr Ala Gin Leu Thr Pro Thr Phe Tyr Arg Glu Thr Cys Pro Asn 

380 385 390 


CTG TTC CCT ATT GTG TTT GGA GTA ATC TTC GAT GCT TCT TTC ACC GAT 1697 
Leu Phe Pro He Val Phe Gly Val He Phe Asp Ala Ser Phe Thr Asp 

395 400 405 


CCC CGA ATC GGG GCC AGT CTC ATG AGG CTT CAT TTT CAT GAT TGC TTT 
Pro Arg He Gly Ala Ser Leu Met Arg Leu His Phe His Asp Cys Phe 
410 415 420 


1745 
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GTT CAA GTACGTACTT TTTTTTTTCC TTCCAAAATG CCCTGCATAT TTAACAAGAT 1801 
Val Gin 
425 

TGCTTTGTTC ACCTAGAAAA ATGTGTTTTT TTCAACGATC TTACGTACGT TTGTTTGGTT 1861 

TGAAAAATAA ATCAGAAAGA GATCAAGAAA ATAGCTAGAA AGAAAGCAAC GTTTTTTTAA 1921 

AAGGTATTTA GTGTGAGAAA AATATTAAAA CTGAAGAGAA AG AAAT T AAA TAAGCTTTTC 1981 

TTGAATGATA TTTACATGTC TTATTAACTT AAAGTCACCT TTTTTCTTTA AGTTGTGCTT 2041 

GAAGAAAAAA GATGTCTTTC AGTTTAGTTT TGATTAATGC TAATTATATT TTTAATTAAT 2101 

TAATTAATAC TATATAT CTA TTTACCATAT TAATTATTAC TATATTTCAT GATGACAACA 2161 

GACAAGTATT CTAAAGAGGT ATCGGTAGAT GATTAATTTT TTTATAAAAA AATCTTTTGC 2221 

GTGTATAGAT ATTCTTTTAT AATTGGTGCA GAAACTTGTA ATGCTAATTG CAATTAATCT 22 81 

TACATTGATT AACTAATAGC TATAATCAAT ATTTAGGTTA GGTATAGGAG AGAAATCAAG 2341 

TGATCTGAAC AAATTAAGTT GTTATATTTG CATTGTGACA G GGT TGT GAT GGA 23 94 

Gly Cys Asp Gly 
1 

TCA GTT TTG CTG AAC AAC ACT GAT ACA ATA GAA AGC GAG CAA GAT GCA 24 42 

Ser Val Leu Leu Asn Asn Thr Asp Thr He Glu Ser Glu Gin Asp Ala 
5 10 15 20 


CTT CCA AAT ATC AAC TCA ATA AGA GGA TTG GAC GTT GTC AAT GAC ATC 


2490 
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Leu Pro Asn lie Asn Ser lie Arg Gly Leu Asp Val Val Asn Asp lie 

25 30 35 


AAG ACA GCG GTG GAA AAT AGT TGT CCA GAC ACA GTT TCT TGT GCT GAT 253 8 

Lys Thr Ala Val Glu Asn Ser Cys Pro Asp Thr Val Ser Cys Ala Asp 

40 45 50 


ATT CTT GCT ATT GCA GCT GAA ATA GCT TCT GTT CTG GTAATTAATA 25 84 

lie Leu Ala lie Ala Ala Glu lie Ala Ser Val Leu 
55 60 


ACTCCTAATT AATTCCCAAC CATTAAAAAG TTGCATGATT GGATT CAAAA TTCTATGGTA 2 64 4 


TTGGGGTTCT GATATAAATT TGTAATTAAA TTGCACTAAA AAAAATTATC ATATACTTTT 2704 


AATAAAAAAA AT TTAT CTAA TTTAATTTAT TATTAAAACT ATTTTTAAAA TTCAATCCTA 2764 


ACTCTTTTTT AAT CGGAGCA TGTAAGCTGG CACCCACCGT ATATCGTTGG AAGATG CTAT 2 8 24 


AAAAC CAT TT AATTAATGGA TGGAATCAGT CAAAACATTT AATTCAAAAT ACTCTTAATT 2 8 84 


GTGATTAGTA ATCATGTTCG GGCAAGTTAC GTTGTGTATA ATTAATTTGA CTT AAT CAGA 2 944 


TAAAAAAACA AATGGACGCA AGCCGGTTGG TAT AG AT AT C ACTGGCCTGT AGAATATGTG 


3004 


GTTTTTCACG TTTAAATAAA AGCTAG CTAC TATATTATAT TTAGTCTTTT TTTTTCTTAA 3 0 64 


ACCCATTTAA CGTGATTTAT TGACTGTGAA ACATGTTTCC ACACACAGGC TTAGAAACTC 3124 


CTCGCAACTA ACATCTCCAA AATTTGACTA TTTATTTATG AAGATAATTC AT CTATGATG 


3184 
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TTCAACTCTA TTATATATAT GTATCATCGC AGTATTAAGA ATTATAATAG TCAAATATAG 3 244 

AAGTATATCG GGTAAATGTA GTTGCATGTG CGACCTGTTT CGTGTAAAAT GCTTATTCTA 3 3 04 

TATAGCTTTT TTTATTGGAA AATAACGATG AACTAAAAAC GAAAGGGTAT CATATAGTTT 3 3 64 

GACTTTTATG TTAGAGAGAG ACATCTTAAT TTGGTCATAT GTTAAATAAT TAATTACAAT 3424 

GCATACACAA ATATTTATGC CATATCTAAA AAATGATAAA ATATCATAGG TATACTCAAC 3484 

TATATGATAT CCCCATAACA GAAATTGTAC TTTTCTTCAG GCAATGAACT TAACATTTCT 3 544 

GTTTGCTAAA AACAAACATC CACTTAAAGT GGTTCAACAT ATT TATGTAA TAATTTACAG 3 6 04 

GGA GGA GGT CCA GGA TGG CCA GTT CCA TTA GGA AGA AGG GAC AGC TTA 3 652 
Gly Gly Gly Pro Gly Trp Pro Val Pro Leu Gly Arg Arg Asp Ser Leu 
15 10 15 

ACA GCA AAC CGA ACC CTT GCA AAT CAA AAC CTT CCA GCA CCT TTC TTC 3 70 0 
Thr Ala Asn Arg Thr Leu Ala Asn Gin Asn Leu Pro Ala Pro Phe Phe 

20 " 25 30 

AAC CTC ACT CAA CTT AAA GCT TCC TTT GCT GTT CAA GGT CTC AAC ACC 3 74 8 
Asn Leu Thr Gin Leu Lys Ala Ser Phe Ala Val Gin Gly Leu Asn Thr 
35 40 45 

CTT GAT TTA GTT ACA CTC TCA GGTATACATA ATCAATTTTT TATTTGCTAT 3 799 
Leu Asp Leu Val Thr Leu Ser 
50 55 


TAGC TAGCAA TAAAAAGTCT CTGATACAGA CATATTTAGA TAAATTAATT T CT C CAT AAA 


3859 


■ass 
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CATTTATAAT AAAATTATCA ATTTATGTAC TTAAAAATTA TGGATTGAAG CTCTTTTCAT 3 919 


CCAACTTTTA CTAAAGTTA& GGTGCATATA ATATAAAATA AACTATCTCT TGTTTCTTAT 3 979 


AAAAAGATTG AAGATAAGTT AAAGTCTACT TATAAATCAT TAATATATGT ATA GGT 4 03 5 

Gly 
1 


GGT CAT ACG TTT GGA AGA GCT CGG TGC AGT ACA TTC ATA AAC CGA TTA 40 83 

Gly His Thr Phe Gly Arg Ala Arg Cys Ser Thr Phe lie Asn Arg Leu 

5 10 15 


TAC AAC TTC AGC AAC ACT GGA AAC CCT GAT CCA ACT CTG AAC ACA ACA 4131 
Tyr Asn Phe Ser Asn Thr Gly Asn Pro Asp Pro Thr Leu Asn Thr Thr 
20 25 30 


TAC TTA GAA GTA TTG CGT GCA AGA TGC CCC CAG AAT GCA ACT GGG GAT 4179 
Tyr Leu Glu Val Leu Arg Ala Arg Cys Pro Gin Asn Ala Thr Gly Asp 
35 40 45 


AAC CTC ACC AAT TTG GAC CTG AGC ACA CCT GAT CAA TTT GAC AAC AGA 4227 
Asn Leu Thr Asn Leu Asp Leu Ser Thr Pro Asp Gin Phe Asp Asn Arg 
50 55 60 65 


TAC TAC TCC AAT CTT CTG CAG CTC AAT GGC TTA CTT CAG AGT GAC CAA 4275 
Tyr Tyr Ser Asn Leu Leu Gin Leu Asn Gly Leu Leu Gin Ser Asp Gin 

70 75 80 


GAA CTT TTC TCC ACT CCT GGT GCT GAT ACC ATT CCC ATT GTC AAT AGC 
Glu Leu Phe Ser Thr Pro Gly Ala Asp Thr lie Pro lie Val Asn Ser 

85 90 95 


4323 
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TTC AGC AGT AAC CAG AAT ACT TTC TTT TCC AAC TTT AGA GTT TCA ATG 43 71 

Phe Ser Ser Asn Gin Asn Thr Phe Phe Ser Asn Phe Arg Val Ser Met 
100 105 110 

ATA AAA ATG GGT AAT ATT GGA GTG CTG ACT GGG GAT GAA GGA GAA ATT 4419 
lie Lys Met Gly Asn lie Gly Val Leu Thr Gly Asp Glu Gly Glu He 
115 120 125 

CGC TTG CAA TGT AAT TTT GTG AAT GGA GAC TCG TTT GGA TTA GCT AGT 4467 
Arg Leu Gin Cys Asn Phe Val Asn Gly Asp Ser Phe Gly Leu Ala Ser 
130 135 140 145 

GTG GCG TCC AAA GAT GCT AAA CAA AAG CTT GTT GCT CAA TCT AAA TAA 4515 
Val Ala Ser Lys Asp Ala Lys Gin Lys Leu Val Ala Gin Ser Lys * 

150 155 160 

ACCAATAATT AATGGGGATG TGCATGCTAG CTAGCATGTA AAGGCAAATT AGGT TGTAAA 4575 

CCTCTTTGCT AG CTATATTG AAATAAACCA AAGGAGTAGT GTGCATGTCA ATTCGATTTT 4 63 5 

GCCATGTACC TCTTGGAATA TTATGTAATA ATTATTTGAA TCTCTTTAAG GTACTTAATT 46 95 


AATCA 


4700 
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THE EMBODIMENTS OF THE INVENTION IN WHICH AN EXCLUSIVE 
PROPERTY OF PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS: 

1 . An isolated DNA molecule comprising the nucleotide sequence of SEQ ID 
NO:l. 


2. An isolated DNA molecule comprising at least 24 contiguous nucleotides 
selected from nucleotides 1-1532 of SEQ ID NO:2 

3. The isolated DNA molecule comprising a nucleotide sequence substantially 
homologous to nucleotides 1533-4700 of SEQ ID NO:2. 

4. The isolated DNA molecule of claim 3 comprising a nucleotide sequence 
substantially homologous to that of nucleotides 1-4700 of SEQ ID NO:2. 

5. The isolated DNA molecule of claim 3 comprising nucleotides 1533-4700 of 
SEQ ID NO:2. 

6. The isolated DNA molecule of claim 4 comprising the nucleotide sequence of 
SEQ ID NO:2. 

7. The isolated DNA molecule of claim 2 comprising a nucleotide sequence 
substantially homologous to that of 1-1532 of SEQ ID NO:2. 

8. The isolated DNA molecule of claim 7, comprising the nucleotide sequence of 
nucleotides 1-1532 of SEQ ID NO:2. 


9. 


An isolated DNA molecule of claim 3 comprising at least 32 contiguous 
nucleotides selected from nucleotides 412-1041 of SEQ ID NO: 2. 


* 

It 
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10. An isolated DNA molecule of claim 9 comprising the nucleotide sequence of 
412-1041 of SEQ ID NO:2. 

11. An isolated DNA molecule of claim 3 comprising at least 23 contiguous 
nucleotides selected from nucleotides 1234-2263 of SEQ ID NO:2. 

12. An isolated DNA molecule of claim 11 comprising the nucleotide sequence of 
1234-2263 of SEQ ID NO:2. 

13. An isolated DNA molecule of claim 3 comprising at least 22 contiguous 
nucleotides selected from nucleotides 2430-2691 of SEQ ID NO:2. 

14. An isolated DNA molecule of claim 13 comprising the nucleotide sequence of 
2430-2691 of SEQ ID NO:2. 

15. A vector which comprises the DNA molecule of claim 1. 

16. A vector which comprises the DNA molecule of claim 2. 

17. A vector which comprises the DNA molecule of claim 3. 

18. The vector of claim 16 which comprises a heterologous gene of interest under 
control of the DNA molecule. 

19. A host cell capable of expressing the DNA molecule within the vector of claim 
15. 

20. A host cell capable of expressing the DNA molecule within the vector of claim 
16. 
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21 . A host cell capable of expressing the DNA molecule within the vector of claim 
17. 

22. A host cell capable of expressing the DNA molecule within the vector of claim 
18. 

23. A transgenic plant comprising the vector of claim 15. 

24. A transgenic plant comprising the vector of claim 16. 

25. A transgenic plant comprising the vector of claim 17. 

26. A transgenic plant comprising the vector of claim 18. 

27. A method for the production of soybean seed coat peroxidase in a host cell / 
comprising: 

i) transforming the host cell with a vector comprising an isolated DNA 
molecule selected from the group consisting of SEQ ID NO:l, and SEQ ID 
NO:2; and 

ii) culturing the host cell under conditions to allow expression of the soybean 
seed coat peroxidase. 

28. A process for producing a heterologous gene of interest comprising propagating 
a transformed plant with the vector of claim 16. 

29. The process of claim 28 wherein the heterologous gene of interest is produced 
within seed coat cells. 


ABSTRACT OF THE DISCLOSURE 

A novel seed coat specific peroxidase genomic sequence is characterized and 
presented. Adjacent DNA regulatory regions have also been characterized. The seed 
coat peroxidase is translated as a 352 amino acid precursor protein of 38 kDa 
comprising a 26 amino acid signal sequence which when cleaved results in a 35 kDa 
protein. Plants containing a dominant Ep allele accumulate large amounts of 
peroxidase in the hourglass cells of the subepidermis. Homozygous recessive epep 
genotypes do not accumulate peroxidase in the hourglass cells and are much reduced 
in total seed coat peroxidase activity. Probes derived from the cDNA, or genomic 
DNA can be used to detect polymorphisms that distinguished EpEp and epep 
genotypes. Cosegregation of the polymorphisms in an F 2 population from a cross of 
EpEp and epep plants shows that the Ep locus encodes the seed coat peroxidase 
protein. Comparison of Ep and ep alleles indicates that the recessive gene lacks 87 bp 
of sequence encompassing the translation start codon. The heterologous expression, 
as well as vectors and hosts to be used for the expression of the seed coat peroxidase, 
are also disclosed. The seed-specific DNA regulatory region may be used to control 
expression of genes of interest such as i) genes encoding herbicide resistance, or ii) 
biological control of insects or pathogens (e.g. B. thuringiensis), or iii) viral coat 
proteins to protect against viral infections, or iv) proteins of commercial interest (e.g. 
pharmaceutical), and v) proteins that alter the nutritive value, taste, or processing of 
seeds . 
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Figure 1 

ATGGGTTCCATGCGTCTATT 2 0 

M G S M R L L 

prx9 + > 

AGTAGTGGCATTGTTGTGTGCATTTGCTATGCATGCAGGTTTTTCAGTCTCTTATGCTCA 80 

VVAT, LCRFAMHAGF S V S 2_A Q 1 

signal sequence 


-is: 


--prxlO- prx2 + > 

AGAAAGCGAGCAAGATGCACTTCCAAATATCAACTCAATAAGAGGATTGGACGTTGTCAA 
ESEQDALPNINS1RGLDVVN 


AGGAAGAAGGGACAGCTTAACAGCAAACCGAAC C CTTGCAAATCAAAACCTTCC AGC AC C 
GRRDSLTANRTLANQNLPAP 

TTTCTTCAACCTCACTCAACTTAAAGCTTCCTTTGCTGTTCAAGGTCTCAACACCCTTGA 
F FNLTQLKASFAVQGLNTLD 

III 

TTTAGTTACACTCTCAG GTGGTCATACGTTTGGAAGAGCTCGGTGCAGTACATTCATAAA 
LVTT.S G^HTF G RARCSTFIN 
heme -binding domain 


CCGATTATACAACTTCAGCAAC^CTGGAAACCCTGATCCAACTCTGAACACAA 

R-LY'NFSNTGNPDPTLNTTYL 

AGAAGTATTGCGTGCAAGATGCCCCCAGAATGCAACTGGGGATAACCTCACCAATTTGGA 
EVLRARCPQNATGDNLTNLD 

CCTGAGCACACCTGATCAATTTGACAACAGATACTACTCCAATCTTCTGCAGCTCAATGG 
LSTPDQFDNRYYSNLL .QLNG 

CTTACTTCAGAGTGACCAAGAACTTTTCTCCACTCCTGGTGCTGATACCATTCCCATTGT 


140 
21 


GCTTACTCCTACGTTCTACAGAGAAACATGTCCAAATCTGTTCCCTATTGTGTTTGGAGT 
LTPTFYRETCPNLFPIVFGV 

prxl2+ > 

AATCTTCGATGCTTCTTTCACCGATCCCCGAATCGGGGCCAGTCTCATGAGGCTTCATTT 

IFDASFTDPRI G A S L M R L H F 

active site 

I < 
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I l I I l l 

1 GCATCATATCATAAACAATACGTACGTGATATTATCTAGTGTCTCTCAGTTTACTTTATG 
6 1 AGAAATTATTTTTCTTTAAAAAAAGTTAATTAATAAAAACATTTGCGATACCGTGAGTTA 
121 CAAGAAATCCGCCGAATTCATCTCTATAAATAAAAGGATCTATATGAGAGGTAAAATCAT 
181 ATTAACTCAAAATGGGTTCCATGCGTCTATTAGTAGTGGCATTGTTGTGTGCATTTGCTA 
241 TGCATGCAGGTTTTTCAGTCTCTTATGCTCAGCTTACTCCTACGTTCTACAGAGAAACAT 
301 GTCCAAATCTGTTCCCTATTGTGTTTGGAGTAATCTTCGATGCTTCTTTCACCGATCCCC 
361 GAATCGGGGCCAGTCTCATGAGGCTTCATTTTCATGATTGCTTTGTTCAAGTACGTACTT 
421 TTTTTTTTCCTTCCAAAATGCCCTGCATATTTAACAAGATTGCTTTGTTCACCTAGAAAA 
4 81 ATGTGTTTTTTTCAACGATCTTACGTACGTTTGTTTGGTTTGAAAAATAAATCAGAAAGA 
541 GATCAAGAAAATAGCTAGAAAGAAAGCAACGTTTTTTTAAAAGGTATTTAGTGTGAGAAA 
601 AATATTAAAACTGAAGAGAAAGAAATTAAATAAGCTTTTCTTGAATGATATTTACATGTC 
661 TTATTAACTTAAAGTCACCTTTTTTCTTTAAGTTGTGCTTGAAGAAAAAAGATGTCTTTC 
721 AGTTTAGTTTTGATTAATGCTAATTATATTTTTAATTAATTAATTAATACTATATATCTA 
781 TTTACCATATTAATTATTACTATATTTCATGATGACAACAGACAAGTATTCTAAAGAGGT 
841 ATCGGTAGATGATTAATTTTTTTATAAAAAAATCTTTTGCGTGTATAGATATTCTTTTAT 
901 AATTGGTGCAGAAACTTGTAATGCTAATTGCAATTAATCTTACATTGATTAACTAATAGC 
961 TATAATCAATATTTAGGTTAGGTATAGGAGACAAATCAAGTGATCTGAACAAATTAAGTT 
1021 GTTATATTTGCATTGTGACAGGGTTGTGATGGATCAGTTTTGCTGAACAACACTGATACA 
1081 ATAGAAAGCGAGCAAGATGCACTTCCAAATATCAACTCAATAAGAGGATTGGACGTTGTC 
1141 AATGACATCAAGACAGCGGTGGAAAATAGTTGTCCAGACACAGTTTCTTGTGCTGATATT 
12 01 CTTGCTATTGCAGCTGAAATAGCTTCTGTTCTGGTAATTAATAACTCCTAATTAATTCCC 

12 61 AACCATTAAAAAGTTGCATGATTGGATTCAAAATTCTATGGTATTGGGGTTCTGATATAA 

13 21 ATTTGTAATTAAATTGCACTAAAAAAAATTATCATATACTTTTAATAAAAAAAATTTATC 
13 81 TAATTTAATTTATTATTAAAACTATTTTTAAAATTCAATCCTAACTCTTTTTTAATCGGA 
1441 GCATGTAAGCTGGCAGCCACCGTATATCGTTGGAAGATGCTATAAAACCATTTAATTAAT 
1501 GGATGGAATCAGTCAAAACATTTAATTCAAAATACTCTTAATTGTGATTAGTAATCATGT 
15 61 TCGGGCAAGTTACGTTGTGTATAATTAATTTGACTTAATCAGATAAAAAAACAAATGGAC 
1621 GCAAGCCGGTTGGTATAGATATCACTGGCCTGTAGAATATGTGGTTTTTCACGTTTAAAT 
1681 AAAAGCTAGCTACTATATTATATTTAGTCTTTTTTTTTCTTAAACCCATTT AACGTGATT 
1741 TATTGACTGTGAAACATGTTTCCACACACAGGCTTAGAAACTCCTCGCAACTAACATCTC 
18 01 CAAAATTTGACTATTTATTTATGAAGATAATTCATCTATGATGTTCAACTCTATTATATA 
1861 TATGTATCATCGCAGTATTAAGAATTATAATAGTCAAATATAGAAGTATATCGGGTAAAT 
1921 GTAGTTGCATGTGCGACCTGTTTCGTGTAAAATGCTTATTCTATATAGCTTTTTTTATTG 
1981 GAAAATAACGATGAAGTAAAAACGAAAGGGTATCATATAGTTTGACTTTTATGTTAGAGA 
2 041 GAGACATCTTAATTTGGTCATATGTTAAATAATTAATTACAATGCATACACAAATATTTA 
2101 TGCCATATCTAAAAAATGATAAAATATCATAGGTATACTCAACTATATGATATCCCCATA 
2161 ACAGAAATTGTACTTTTCTTCAGGCAATGAACTTAACATTTCTGTTTGCTAAAAACAAAC 
2221 ATCCACTTAAAGTGGTTCAACATATTTATGTAATAATTTACAGGGAGGAGGTCCAGGAT'G 
22 81 GCCAGTTCCATTAGGAAGAAGGGACAGCTTAACAGCAAACCGAACCGTTGCAAATCAAAA 
2341 CCTTCCAGCACCTTTCTTCAACCTCACTCAACTTAAAGCTTCCTTTGCTGTTCAAGGTCT 
2401 CAACACCCTTGATTTAGTTACACTCTCAGGTATACATAATCAATTTTTTATTTGCTATTA 
2461 GCTAGCAATAAAAAGTCTCTGATACAGACATATTTAGATAAATTAATTTCTCCATAAACA 
2521 TTTATAATAAAATTATCAATTTATGTACTTAAAAATTATGGATTC3AAGCTCTTTTCATCC 
2 581 AACTTTTACTAAAGTTAAGGTGCATATAATATAAAATAAACTATCTCTTGTTTCTTATAA 
2641 AAAGATTGAAGATAAGTTAAAGTCTACTTATAAATCATTAATATATGTATAGGTGGTCAT 
2701 ACGTTTGGAAGAGCTCGGTGCAGTACATTCATAAACCGATTATACAACTTCAGCAACACT 
2761 GGAAACCCTGATCCAACTCTGAACACAACATACTTAGAAGTATTGCGTGCAAGATGCCCC 
2821 CAGAATGCAACTGGGGATAACCTCACCAATTTGGACCTGAGCACACCTGATCAATTTGAC 

2 8 81 AACAGATACTACTCCAATCTTCTGCAGCTCAATGGCTTACTTCAGAGTGACCAAGAACTT 
2941 TTCTCCACTCCTGGTGCTGATACCATTCCCATTGTCAATAGCTTCAGCAGTAACCAGAAT 

3 001 ACTTTCTTTTCCAACTTTAGAGTTTCAATGATAAAAATGGGTAATATTGGAGTGCTGACT 


3061 GGGGATGAAGGAGAAATTCGCTTGCAATGTAATTTTGTGAATGGAGACTCGTTTGGATTA 
3121 GCTAGTGTGGCGTCCAAAGATGCTAAACAAAAGCTTGTTGCTCAATCTAAATAAACCAAT 
3181 AATTAATGGGGATGTGCATGCTAGCTAGCATGTAAAGGCAAATTAGGTTGTAAACCTCTT 
3241 TGCTAGCTATATTGAAATAAACCAAAGGAGTAGTGTGCATGTCAATTCGATTTTGCCATG 
3 3 01 TACCTCTTGGAATATTATGTAATAATTATTTGAATCTCTTTAAGGTACTTAATTAATCA 


1 


Figure 3A 

L 7 8 1 6 3 ATGGGTTCCATGCGT - CTATTAGTAGTGGCATTGTTG 3 6 


U41657 


U41657 
X90693 
X90694 


L78163 
U41657 
X90693 
X90694 
L36156 
X90692 


CACTTCCAAATATCAACTCAATAAGAGGATTGGACGTTGTCAATGACATC 
CACTTCCAAATATCAACTCAATAAGAGGATTGGACGTTGTCAATGACATC 
CTTTTCCAAACAGAAACTCATTAAGAGGTTTGGATGTTGTGAATCAAATC 
CTTTTCCAAATAACAACTCTCTAAGAGGTTTGGATGTTGTGAATCAGATC 
CTTTTCCAAATAACAACTCTCTAAGGGGTTTGGATGTTGTGAATCAGATC 
CACCACCAAATAACAACTCCATAAGAGGTTTGGATGTGATAAACCAGATC 
* ***** * ***** ****.**.***** **..* ** * *** 


0 


X9 0 6 9 3 G GCAAA- CAATGAACTCCCTTCGTGCTGTAGCAATAG - CTTTGTGC 4 4 

X9 0 6 9 4 GCTCTTCAAAACAATGAACTCC TTAGCAACTT - CTATGTGG 4 0 

L36156 CTCC TTAGCAACTT - CTATGTGG 2 2 

X9 0 692 AATGCTTGGT CTAAGTGCAACAGCTTTTTGCTGTATGG 3 8 

L 7 8 1 6 3 TGT GCATTT - GCTATGCATGCAGGTTTTTCAGT CTCTTATGC 7 7 

U41657 0 

X9 0 6 93 TGTATTGTG GTTGTGCTTGGAGGGTTACCCTTCTCTTCAAATGC 8 8 

X 9 0 6 9 4 TGTGTTGTGCTTTTAGTTGTGCTTGGAGGACTACCCTTTTCCTCAGATGC 9 0 

L3 6 1 5 6 TGTGTTGTGCTTTTAGTTGTGCTTGGAGGACTACCCTTTTCCTCAGATGC 7 2 

X 9 0 6 9 2 TGT - TTGTGCTAAT TGGAGGAGTACCCTTTT CAAATGC 7 5 

L7 8 1 6 3 TCAGCTTACTCCTACGTTCTACAGAGAAACATGTCCAAATCTGTTCCCTA 12 7 

U41657 0 

X90693 GCAACTTGATCCATCCTTTTACAGGAACACTTGTCCAAATGTTAGTTCCA 13 8 

X9 0 6 94 ACAACTTAGTCCCACTTTTTACAGCAAAACGTGTCCAACTGTTAGTTCCA 14 0 

L3 6 1 5 6 ACAACTTAGTCCCACTTTTTACAGCAAAACGTGTCCAACTGTTAGTTCCA 122 

X9 0 6 9 2 ACAACTAGATCCTTCATTTTACAACAGTACATGTTCTAATCTTGATTCAA 125 

1,7 8 1 6 3 TTGTGTTTGGAGTAATCTTCGATGCTTCTTTCACCGATCCCCGAATCGGG 17 7 

U41657 0 

X 9 0 6 9 3 TTGTTCGTGAAGTCATAAGGAGTGTTTCTAAGAAAGATCCTCGTATGCTT 18 8 

X9 0 6 94 TTGTTAGCAATGTCTTAACAAACGTTTCTAAGACAGATCCTCGCATGCTT 190 

L3 6 1 5 6 TTGTTAGCAATGTCTTAACAAACGTTTCTAAGACAGATCCTCGCATGCTT 172 

X9 0 6 92 TCGTACGTGGTGTGCTCACAAATGTTTCACAATCTGATCCCAGAATGCTT 175 

L 7 8 1 6 3 GCCAGTCTCATGAGGCTTCATTTTCATGATTGCTTTGTTCAAGGTTGTGA 2 2 7 

U41657 TTTCATGATTGCTTTGTTCAAGGTTGTGA 2 9 

X9 0 6 9 3 GCTAGTCTTGTCAGGCTTCACTTTCATGACTGTTTTGTTCAAGGTTGTGA 23 8 

X9 0 6 94 GCTAGTCTCGTCAGGCTTCACTTTCATGACTGTTTTGTTCTGGGATGTGA 24 0 

L3 6 156 GCTAGTCTCGTCAGGCTTCACTTTCATGACTGTTTTGTTCTGGGATGTGA 222 

X9 0 6 92 GGTAGTCTCATCAGGCTACATTTTCATGACTGTTTTGTTCAAGGTTGCGA 22 5 

******** ** *******. .**.** ** 

L 7 8 1 6 3 TGGATCAGTTTTGCTGAACAACACTGATACAATAGAAAGCGAGCAAGATG 277 

TGGATCAGTTTTACTGAACAACACTGATACAATAGAAAGCGAGCAAGATG 
TGCATCAGTTTTACTAAACAAAACTGATACCGTTGTGAGTGAACAAGATG 
TGCCTCAGTTTTGCTGAACAATACTGCTACAATCGTAAGCGAACAACAAG 

L3 6 1 5 6 TGCCTCAGTTTTGCTGAACAATACTGCTACAATCGTAAGCGAACAACAAG 2 72 

X9 0 6 92 TGCCTCGATTTTGCTGAACGATACGGCTACAATAGTGAGCGAGCAAAGTG 275 

**** ** *** * **.* *** .* *..** **.*** ..* 


79 
288 
29 0 


327 
129 
338 
340 
322 
325 


L78163 


AAGACAGCGGTGGAAAATAGTTGTCCAGACACAGTTTCTTGTGCTGATAT 377 


i 
t 


U4 1 6 5 7 AAGACAGCGGTGGAAAATAGTTGTCCAGACACAGTTTCTTGTGCTGATAT 17 9 

X9 0 6 9 3 AAAACAGCTGTGGAAAAGGCTTGTCCTAACACAGTTTCTTGTGCTGATAT 38 8 

X9 0 6 9 4 AAACTGGCTGTAGAAGTGCCTTGTCCTAACACAGTTTCTTGTGCTGATAT 3 90 

L 3 g 1 5 g AAAACTGCTGTAGAAAGTGCTTGTCCTAACACAGTTTCTTGTGCTGATAT 3 72 

X9 0 6 9 2 AAAACAGCGGTGGAAAATGCTTGTCCTAACACAGTTTCTTGTGCTGATAT 3 75 

** ** ** *** ****** ********************** 

» * * * * * * 


L 7 8 1 6 3 TCTTGCTATTGCAGCTGAAATAGCTTCTGTT - CTGGGAGGAGGTC CAGGA 426 

U4 1 6 5 7 TCTTGCTATTGCAGCTGAAATAGCTTCTGTTGCTGGGAGGAGGTC - AGGA 22 8 

X9 0 6 9 3 TCTTGCTCTTTCTGCTGAATTATCATCTACA- CTGGCAGATGGTCCTGAC 43 7 

X90694 TCTTGCACTTGCTGCTCAAGCATCCTCTGTT - CTGGCACAAGGTCCTAGT 439 

L3 6 1 5 6 TCTTGCACTTGCT CAAGCATCCTCTGTT - CTGGCACAAGGTCCTAGT 418 

X90692 TCTTGCTCTTTCTGCTGAAATATCATCTGAT-CTGGCAAATGGTCCTACT 424 

******. **.*. **. *.* ***. . **** * ..**** 

L7 8 1 6 3 TGGCCAGTTCCATTAGGAAGAAGGGACAGCTTAACAGCAAACCGAACCCT 476 

U4 1 6 5 7 TGGCCAGTTCCATTAGGAAGAAGGGACAGCTTAACAGCAAACCGAACCCT 278 

X9 0 6 9 3 TGGAAGGTTCCTTTAGGAAGAAGAGATGGTTTAACGGCAAACCAGTTACT 48 7 

X906 94 TGGACGGTTCCTTTAGGAAGAAGGGATGGTTTAACCGCAAACCGAACACT 489 

L3 6 1 5 6 TGGACGGTTCCTTTAGGAAGAAGGGATGGTTTAACCGCAAACCGAACACT 468 

X906 92 TGGCAAGTTC CATTAGGAAGAAGGGATAGTTTGACAGCAAATAATTC C CT 474 

*** ****************.** .* **.** ***** ** 

L7 8 16 3 TGCAAATCAAAACCTTCCAGCACCTTTCTTCAA- -CCTCA- CTCAACTTA 523 

U4 1 6 5 7 TGCAAATCAAAACCTTCCAGCACCTTTCTTCAA- - CCTCA- CTCAACTTA 3 25 

X9 0 6 9 3 TGCTAATCAAAATCTTCCAGCTCC - - - TTTCAATACTACTGATCAACTTA 534 

X9 0 6 94 TGCAAATCAAAATCTTCCGGCTCC- - - ATTCAATTCCTTGGATCAACTTA 536 

L3 6 1 5 6 TGCAAATCAAAATCTTCCGGCTCC - - - ATTCAATTCCTTGGATCACCTTA 515 

X9 0 6 92 TGCAGCTCAAAATCTTCCTGCCCCCACTTTCAA- - CCTTA- CTCGACTAA 521 

*** ****** ***** ** ** ***** * . . **. **.* 

L7 8 1 6 3 AAGCTTCCTTTG- CTGTTCAAGGTCTCAACACCCTTGATTTAGTTACACT 572 

U4 1 6 5 7 AAGCTTCCTTTG - CTGTTCAAGGTCTCAACACCCTTGATTTAGTTACACT 3 74 

X 9 0 6 9 3 AAGCTGCATTTG - CTGCTCAAGGTCTCGATACTACTGATCTGGTTGCACT 583 

X9 0 6 9 4 AAGCTGCATTT- ACTGCTCAAGGCCTCAATACTACTGATCTAGTTGCACT 58 5 

L3 6 1 5 6 AA- CTGCATTTGACTGCTCAAGGCCTCATTACTCCTGTTCTAGTTGCCCT 564 

X 9 0 6 9 2 AATCTAACTTTGA- TAATCAAAACCTCAGTACTACTGATCTAGTTGCACT 57 0 

** **. *** *. ****.. ***.. ** **.* *.***.* ** 

1/78163 CTCAGGTGGTCATACGTTTGGAAGAGCTCGGTGCAGTACATTCATAAACC 622 

U4 1 6 5 7 CTCAGGTGGTCATACGTCTGGAAGAGCTCGGTGCAGTACATTCATAAACC 424 

CTCCGGTGCTCATACATTTGGAAGAGCTCATTGCTCTTTATTTGTTAGCC 
CTCGGGTGCTCATACATTTGGAAGAGCTCATTGCGCACAATTTGTTAGTC 
CTCGGGTGCTCATACATTTGGAAGAGCTCATTGCGCACAATTTGTTAGTC 
CTCAGGTGGCCATACAATTGGAAGAGGTCAATGCAGATTTTTCGTTGATC 
*** **** *****.. ******** **..***. . .** * 


X90693 
X90694 
L36156 
X90692 


633 
635 
614 
620 


L78163 
U41657 


GATTATACAACTTCAGCAACACTGGAAACCCTGATCCAACTCTGAACACA 
GATTATACAACTTCAGCAACACTGGA CTGATCCA- CT - TGGACACA 


672 
468 


4 


^ * 

~i z '£ 


X90693 GATTGTACAACTTCAGCGGTACGGGAAGTCCCGATCCAACTCTTAACACA 68 3 

X9 0 6 9 4 GATTGTACAACTTCAGCAGTACTGGAAGTCCCGATCCAACTCTTAACACA 6 8 5 

L3 6 1 5 6 GATTGTACAACTTCAGCAGTACTGGAAGTCCCGATCCAACTCTTAACACA 664 

X9 0 6 9 2 GATTATACAATTTCAGCAACACTGGAAACCCCGATTCAACTCTTAACACG 6 70 

**** ***** ****** ** *** * *** ** ** *_****, 

L7 8 16 3 ACATACTTAGAAGTATTGCGTGCAAGATGCCCCCAGAATGCAACTGGGGA 722 

U4 1 6 5 7 ACATACTTAGAAGTATTGCGTGCAAGATGCCCCCAGAATGCAACTGGGGA 518 

X9 0 6 9 3 ACTTACTTACAACAATTGCGCACAATATGTCCCAATGGTGGACCTGGCAC 73 3 

X90694 ACTTACTTACAACAACTGCGCACAATATGTCCCAATGGTGGACCTGGCAC 735 

L3 6 1 5 6 ACTTACTTACAACAACTGCGCACAATATGTCCCAATGGTGGACCTGGCAC 714 

X 9 0 6 9 2 ACCTATTTACAAAC ATTGCAAGCAATATGTCCCAATGGTGGACCTGGTAC 72 0 

** ** *** ** * ***. .***.*** *** *...** * **** . 

L 7 8 1 6 3 TAACCTCACCAATTTGGACCTGAGCACACCTGATCAATTTGACAACAGAT 772 

U4 1 6 5 7 TAAGCTC ACCAATTTGGACCTGAGCACACCTGATCAATTTGACAACAGAT 56 8 

X9 0 6 9 3 GAACCTTACCAATTTCGATCCAAGGACTCCTGATAAATTTGACAAGAACT 78 3 

X906 94 AAACCTTACCAATTTCGATCCAACGACTCCTGATAAATTTGACAAGAACT 78 5 

AAACCTTACCAATTTCGATCCAACGACTCCTGATAAATTTGACAAGAACT 
AAACCTAACCGATTTGGACCCAACCACACCAGATACATTTGACTCCAACT 


L36156 
X90692 


L78163 
U41657 
X90693 
X90694 
L36156 
X90692 


X90693 
X90694 
L36156 
X90692 


X90693 
X90694 
L36156 
X90692 


L78163 
U41657 


***** *** **** ** * .* ★*.**.*** *******, *. * 


ACTACTCCAATCTTCTGCAGCTCAATGGCTTACTTCAGAGTGACCAAGAA 
ACTACTCCAATCTTCTGCAGCTCAATGGCTTACTTCAGAGTGACCAAGAA 
ATTACTCTAATCTTCAAGTGAAAAAAGGTTTGCTTCAAAGTGATGAAGAG 
ATTACTCCAATCTTCAAGTGAAAAAGGGTTTGCTCCAAAGTGATCAAGAG 
ATTACTCCAATCTTCAAGTGAAAAAGGGTTTGCTCCAAAGTGATCAAGAG 
ACTACTCCAATCTCCAAGTTGGAAAGGGCTTGTTTCAGAGTGACCAAGAG 
* ***** ***** * **.** **. * **.***** **+**. 


TGGGTAATATTGGAGTGCTGACTGGGGATGAAGGAGAAATTCGCTTGCAA 
TGGGTAATATTGGAGTGCTGACTGGGGATGAAGGAGAAATTCGCTTGCAA 
X 9 0 6 9 3 TGGGAAATATTGGTGTGTTAACCGGGAACCAAGGAGAGATTAGAAAACAA 
X 9 0 6 9 4 T GGGCAATATTGGTGTGCTAACAGGGACAAAAGGAGAGATTAGAAAACAA 
L 3 6 1 5 6 TGGGCAATATTGGTGTGCTAACAGGGACAAAAGGAGAGATTAGAAAACAA 
X9 0 6 92 TGGGTAATATTGGAGTTTTAACTGGATCTCAAGGTGAAATTAGAACACAG 


764 
770 


822 
618 
833 
835 
814 
820 


L7 8 1 6 3 CTTTTCTCCACTCCTGGTGCTGATACCATTCCCATTGTCAATAGCTTCAG 872 

U4 1 6 5 7 CGTTTCTCCACTCCTGGTGCTGATACCATTCC - ATTGTCAATAGCTTCAG 667 

TTGTTCTCAACATCTGGTTCAGATACCATTAGCATTGTCAACAAATTCGC 
TTGTTCTCAACTTCTGGTGCAGATACCATTAGCATTGTCAACAAATTCAG 
TTGTTCTCAACTTCTGGTGCAGATACCATTAGCATTGTCGACAAATTCAG 
CTTTTTTCCAGAAATGGTTCTGACACTATTTCTATTGTCAATAGTTTCGC 
** ** * **** * ** ** *** ******.* *. ***. 


883 
885 
864 
870 


L7 8 1 6 3 CAGTAACCAGAATACTTTCTTTTCCAACTTTAGAGTTTCAATGATAAAAA 922 

U41657 CG--AACCAGAATACTTTCTTTTCCAACTTTAGAGTTTCAATGATAAAAA 715 

AACCGATCAAAAAGCTTTTTTTGAGAGCTTTAGGGCTGCTATGATCAAAA 
CACCGATCAAAATGCTTTCTTTGAGAGCTTTAAGGCTGCAATGATTAAAA 
CACCGATCAAAATGCTTTCTTTGAGAGCTTTAAGGCTGCAATGATTAAAA 
CAATAATCAAACTCTCTTCTTTGAAAATTTTGTAGCCTCAATGATAAAAA 
* ** * ** ***. *. ***_.* .*.**★** **** 


933 
935 
914 
920 


972 
765 
983 
985 
964 
970 


**** ** **★ * ** 


L7 8 1 6 3 TGTAATTTTGTGAA TGGAGACTCGT TTGGATTAGC 100 7 

U4 1 6 5 7 TGTAATTTTGTGAA TGGAGACTCGT TTGGATTAGC 800 

X9 0 6 9 3 TGCAACTTTGTTAATT CAAAATCAGCAGAACTTGGTCTTAT 1024 

X90694 TGCAACTTTGTGAACTTTGTGAACTCAAATTCTGCAGAACTAGATTTAGC 103 5 

L3 6 1 5 6 TGCAACTT TGTGAACTCAAATTCTGCAGAACTAGATTTAGC 1005 

X 9 0 6 9 2 TG TAATGCTGTGAATGGGAATTCTTC TGGATTGGC 1005 

L 7 8 1 6 3 TAGTGTGGCGTCCAAAGATGCTAAACAAAAGCTTGTTGCTCAATCTAAAT 1057 

U41657 TAGTGTGGCGTCCAAAGATGCTAAACAAAAGCTTGTTGCTCAATCTAAAT 8 50 

X90693 CAATGTTGCCTC AGCAG- - ATTCATCTG - AGGAGGGTATGGTTAG - - 1066 

X90694 CACCATAGCATCCATAGTAG- -AATCATTAG-AGGATGGTATTGCTAGTG 1082 

L3 6 1 5 6 CACCATAGCATCCATAGTAG- - AATCATTAG- AGGATGGAATTGCTAGTG 1052 

X90692 TACTGTAGTCACCAA AG- - AATCATCAG - AAGATGGAATGGCTAGCT 104 9 

* .*.* .* .* *..**. .* ..*..* . ... **. 

L 7 8 1 6 3 AAAC CAATAATTAATGGGGATGTGCATGCTAGCTAGCATGTAAAGGCAAA 1107 

U4 1 6 5 7 AAACCAATAATTAATGGGGATGTCGATGCTAGCTACGATGTAAAGGCAAA 900 

X90693 CTCAATGTAAA- TG - TAG 1082 

X 9 0 6 9 4 TAATATAAATAAATTAG CGTAAATGC ACTTATTGAA- ATCTTG 1124 

L 3 6 1 5 6 TAATATAAATAAATTAG CGAAAATGCACTTATTGAA- ATCTTG 1094 

X90692 CATTCTAAAT - -ATAAG CTTGGAAAATATTGAAGAGGTTCTAT 1090 

L 7 8 1 6 3 T TAGGTTGTAAACCTCTTTGCTAGCTATATTGAAATAAACCAAAGGAGTA 115 7 

U4 1 6 5 7 TTAGGTTG - AAACCTCTTTGCTAGCTATATTGAAATAAACCAAAGGAGTA 94 9 

X90693 T- - GATTGGAAGCAACTAA- - TAAATTAAGAAGCTATAAC T 1119 

X90694 T- - GACTAGATGCCACTAA- -TAAAT AAGTTATAAC T 1157 

L3 6 15 6 T- -GACTAGATCCCACTAA- -TAAAT AAGTTATAAC T 1127 

X90692 A-- ATTTTGTGCATACATA- - TATGGTATGTG 1118 

* * . . . ** . 


L 7 8 1 6 3 GTGTGCATGTCAATTCGATTTTGC - CATGTACCTCTTGGAATAT 1200 

U4 1 6 5 7 GTGTCGATGTCAATTCGATTTTGC - CATGTACCTCTTGGAATATTATGTA 998 

X90693 . ATGCACATT - CATGGTATGTGTGAGATAGTTATTAGATGCTTTGTGAGCA 1168 

X9 0 6 94 AGGCACATTTCATGTCACTTGAAATTTCATGCCT - GTATATGAG 1200 

AGGCACATTTCATGTCACTTGAAATCCTATGCCTTGTATATTAGAGGACG 


L36156 


1200 


L78163 

U4 1 6 5 7 ATAATTATTTGAATCTC AAAAAAAAAAAAAAAA 1031 

X90693 AAAATCTTTTGGATTTC ATTTGAAGTGTTTCT 12 00 


1200 


X90694 

L3 6 1 5 6 TGT - TCTT C TTGGTATTATACTA- - T 1200 

X 9 0 6 9 2 GGGA- CTGTAGAAGCTCCCTAATAATATTTGTGTCAAAGT 1200 


1177 


X9 0 6 9 2 CATGTGGTGTA- - TTATGTTTTTGTTATGTTCTTCAAGTTGATCA 1161 


1 

* 


Figure 3B 


L7 8 1 6 3 MGSMRLLVVALLCAFAMHAGFSVSY AQLTPTFYRETCPNLFPIVFGV 4 7 


U41657 


X90693 
X90694 


0 


X906 93 MNSLRAVAIALCCIV- -VVLGGLPFSSNAQLDPSFYRNTCPNVSSIVREV 48 

X906 94 * MNSL ATSMWCVVLLWLGGLPFSSDAQLSPTFYSKTCPTVSSIVSNV 47 

L3 6 1 5 6 M WCVVLLWLGGLPFSSDAQLSPTFYSKTCPTVSSIVSNV 4 0 

X906 92 MLGLSATA FCCMVFVLIGGVPFS -NAQLDPSFYNSTCSNLDSIVRGV 46 


L 7 8 1 6 3 I FDAS FTDPRI GAS LMRLHFHDCFVQGCDGSVLLNNTDTIESEQDALPNI 9 7 

U41657 FHDCFVQGCDGSVLLNNTDTIESEQDALPNI 31 

X90693 IRS VS KKDPRMLAS LVRLHFHDCFVQGCDAS VLLbTKTDTV^ 9 8 

X906 94 LTNVSKTDPRMLASLVRLHFHDCFVLGCDASVLL^^ 97 

L3 6 1 5 6 LTNVS KTDPRMLAS LVRLHFHDCFVLGCDAS VLLNNTAT I VS EQQAFPNN 9 0 

X906 92 LTNVSQSDPRMLGSLIRLHFHDCFVQGCDASIIjIiNDTATIVSEQSAPPNN 96 

****** *** ^ *.***.*.* . *** * ** 

L7 8 1 6 3 NS I RGLD WND I KTAVENS CPDTVSCAD ILAIAAE XASVLGGGPGWPVPL 147 

U4 16 57 NS I RGLD WND I KTAVENS CPDTV S CAD I LAI AAE I AS VAGRRS GWP VPL 8 1 

X906 93 NS LRGLD WNQ IKTAVEKACPNTVS CAD ILALS AELS STLADGPDWKVPL 148 

X 9 0 6 9 4 NS LRGLD WNQ I KLAVEVPCPNTVS CAD I LALAAQAS S VLAQGP S WTVPL 147 

L36156 NS LRGLDWNQ I KTAVE S ACPNTVS CAD I LALA - Q AS S VLAQGP S WTVPL 13 9 

X90692 NS IRGLDVINQIKTAVENACPNTVS CAD I LALS AE I S SDLANGPTWQVPL 146 

** ***** * ** *** **.*********.. . .* . *** 

L7 8 1 6 3 GRRDSLTANRTLANQNLPAPFFNLTQLKAS FAVQGLNTLDLVTLSGGHTF 197 

U41657 GRRDSLTANRTLANQNLPAPFFNLTQLKASFAVQGLNTLDLVTLSGGHTS 131 

X9 0 6 93 GRRDGLTANQLLANQNLPAPFNTTDQLKAAFAAQGLDTTDLVALSGAHTF 198 

X9 0 6 94 GRRDGLTANRTLANQNLPAPFNSLDQLKAAFTAQGLNTTDLVALSGAHTF 197 

L3 6 1 5 6 QRRDGLTANRTLANQNLPAPFNSLDHLKLHLTAQGLITPVLVALSGAHTF 18 9 

X90692 GRRDSLTANNSLAAQNLPAPTFNLTRLKSNFDNQNLSTTDLVALSGGHTI 196 

**** ****. **.****** . ..** .. *.*' * **.***.** 


248 
247 


L78 163 GRARCSTFINRLYNTSNTGNPDPTLNTTYLEVLRARCPQNATGDNLTNLD 247 

U41657 GRARCSTFINRLYNFSNTGLIH- -LDTTYLEVLRARCPQNATGDNLTNLD 179 

GRAHCSLFVSRLYNFSGTGSPDPTLNTTYLQQLRTICPNGGPGTNLTNFD 
GRAHCAQFVSRLYNFSSTGSPDPTLNTTYLQQLRTICPNGGPGTNLTNFD 

L36156 GRAHCAQ FVS RL YNFS S TGS PD PTLNTT YLQQLRT I CPNGGPGTNLTNFD 239 

X9 0 6 9 2 GRGQCRFFVDRLYNFSNTGNPDSTLNTTYLQTLQAI CPNGGPGTNLTDLD 246 

**..* *..★*****.** . *.****. **....*.***..* 

L78163 LSTPDQFDNRYYSNLLQLNGLLQSDQELFSTPGADTIPIVNSFSSNQNTF 297 

U41657 LSTPDQFDNRYYSNLLQLNGLLQSDQERFSTPGADTIPLS I A- SANQNTF 228 

X9 0 6 93 PTTPDKFDKNYYSNLQVKKGLLQSDQELFSTSGSDTIS IVNKFATDQKAF 2 98 

X90694 PTTPDKFDKNYYSNLQVKKGLLQSDQELFSTSGADTISIVNKFSTDQNAF 297 

L36156 PTTPDKFDKNYYSNLQVKKGLLQSDQELFSTSGADTISIVDKFSTDQNAF 289 

X9 0 6 9 2 PTTPDTFDSNYYSNLQVGKGLFQSDQELFSRNGSDTI S IVNSFANNQTLF 296 

*** ** ***** ******* ** **** _ * 


L 7 8 1 6 3 F SNFRVSMI KMGNI GVLTGDEGE IRLQCNFVN- GDS FGLASVAS - K 341 

U416 5 7 FSNFRVSMI KMGNI GVLTGDEGE IRLQCNFVN GD S FGLASVAS -K 2 72 

X90693 FE S FRAAM I KMGNI GVLTGNQGE IRKQCNFVN S KS AELGL INVAS - A 344 

X9 0 6 94 FES FKAAMIKMGNIGVLTGTKGEIRKQCNFVNFVNSNS AELDLATIAS IV 347 

L36156 FES FKAAMIKMGNIGVLTGTKGEIRKQCNFVN SNS AELDLATIAS IV 3 36 
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